Node-to-node data distribution

ABSTRACT

Node-to-Node data distribution is described herein. A node may receive a set of peer nodes from a collection authority node that is managing a collection. The node and the set of peer nodes are members of the collection. The node may select a subset of peer nodes from the set of peer nodes. The node may attempt to establish communications with each of the subset of peer nodes, connected peers being those peers where the attempt was successful. The node may synchronize an event stream with each connected peer.

CLAIM OF PRIORITY

This patent application claims the benefit of priority, under 35 U.S.C.§119, to U.S. Provisional Patent Application Ser. No. 62/057,492, titled“Data Distribution System,” filed on Sep. 30, 2014, which is herebyincorporated by reference herein in its entirety.

BACKGROUND

Data Distribution Mechanisms keep file system elements such as files anddirectories of computer systems synchronized across multiple nodes.Thus, for example, if a file is modified on a first device, thosechanges may be propagated to a second node. Typically, thissynchronization happens across a network, such as a Local Area Network(LAN), a Wide Area Network (WAN)—such as the Internet, a CellularNetwork, or a combination of networks. The synchronization may be acrossnodes owned by a common user as well as across nodes owned by differentusers.

BRIEF DESCRIPTION OF DRAWINGS

Some embodiments are illustrated by way of example and not limitation inthe figures of the accompanying drawings.

FIG. 1 is a block diagram of an example of a system implementing a datadistribution mechanism, according to an example.

FIG. 2 is a relationship diagram of an example of a data distributionmechanism collection schema, according to an example.

FIG. 3 is a diagram of an example of a file system with multiplecollections, according to an example.

FIG. 4 is a diagram of an example of a file system including acollection from a different node, according to an example.

FIGS. 5A and 5B are block diagrams of examples of participant nodes,according to an example.

FIG. 6 is a network communication diagram of an example datadistribution system, according to an example.

FIG. 7 is a diagram of an example implementation of a participant nodenetwork, according to an example.

FIG. 8 is a diagram of an example implementation of a participant nodenetwork with a multi-segment physical network, according to an example.

FIG. 9 is a diagram of an example of different participant nodeclassifications for a node set based on connection information,according to an example.

FIGS. 10A and 10B are block diagrams of examples of store point nodes,according to an example.

FIG. 11 is network communication diagram of an example data distributionsystem including both store point nodes and endpoint nodes, according toan example.

FIG. 12 is a diagram of an example implementation of a participant nodenetwork including store point nodes and endpoint nodes, according to anexample.

FIG. 13 is a diagram of an example implementation of a participant nodenetwork with endpoint and store point nodes in a multi-segment physicalnetwork, according to an example.

FIG. 14 is a flowchart that illustrates a method of connecting toparticipant nodes, according to an example.

FIG. 15A is a diagram that illustrates a connection between twoparticipant nodes, according to an example.

FIG. 15B is a diagram that illustrates a connection between threeparticipant nodes, according to an example.

FIG. 15C is a diagram that illustrates a connection between fourparticipant nodes, according to an example.

FIG. 16 is a flowchart that illustrates a method of connecting to otherparticipant nodes, according to an example.

FIG. 17 is a block diagram that illustrates modules at a participantnode and an authority peer management service that implement theparticipant node connections, according to an example.

FIG. 18 is a diagram that illustrates a block tracking and indexingmechanism used in nodes implementing a block-based data distributionmechanism, according to an example.

FIG. 19 is a diagram that illustrates a relational data schema for aversioned data store, according to an example.

FIG. 20 is a diagram that illustrates processing for a file system eventusing block deduplication techniques for a file, according to anexample.

FIG. 21 is a diagram that illustrates processing for a file system eventusing block deduplication techniques for a transfer of file blocks,according to an example.

FIG. 22 is a flowchart that illustrates client-side node operations forperforming a file replication using block deduplication techniques,according to an example.

FIG. 23 is a flowchart that illustrates serving-side node operations forperforming a file replication using block deduplication techniques,according to an example.

FIG. 24 is a block diagram that illustrates a system configured forperforming a file replication using block deduplication techniques,according to an example.

FIG. 25 is a set diagram illustrating elements of a collection comparedto elements of a locally available set, according to an example.

FIG. 26 is a system diagram illustrating a data distribution systemutilizing predictive storage, according to an example.

FIG. 27 is a flowchart of a method illustrating a data distributionsystem utilizing predictive storage, according to an example.

FIG. 28 is a block diagram that illustrates a system configured forperforming predictive storage techniques, according to an example.

FIG. 29 is a functional block diagram that illustrates eventcoordination and processing in a node of a distributed data system,according to an example.

FIG. 30A is an example user interface that illustrates collectionmanagement operations in a file system viewer, according to an example.

FIG. 30B is an example user interface that illustrates collectionmanagement operations in a collection graphical user interface,according to an example.

FIG. 31A is a sequence diagram that illustrates operations to create acollection, according to an example.

FIG. 31B is a sequence diagram that illustrates operations to create andsynchronize a collection, according to an example.

FIG. 32 is a flowchart of a method illustrating creation of acollection, according to an example.

FIG. 33 is a flowchart of a method illustrating a synchronization of acollection, according to an example.

FIG. 34 is a flowchart of a method illustrating a synchronization of acollection, according to an example.

FIG. 35 is a block diagram that illustrates components of a machineaccording to various examples of the present disclosure.

DETAILED DESCRIPTION

FIG. 1 is a block diagram of an example of a system 100 implementing adata distribution mechanism. The system 100 can include a plurality ofnodes (e.g., Nodes 105 and 140) and an Authority 120. Each of Node 105,Node 140, or the Authority 120 are implemented on a machine (e.g.,computing device, cluster of computing devices, etc.), such as thatdescribed below with respect to FIG. 35. As further described herein,this data distribution mechanism may be used in connection with datasynchronization, sharing, backup, archiving, and versioning operationsfor a plurality of connected machines on behalf of one or a plurality ofusers.

The Node 105 can include a data distribution mechanism 110 and a datastore 115. The data store 115 includes data corresponding to file systemelements. As used herein, a file system element is one of a directory(e.g., folder) or a file. File system elements can include meta data andcontent. For example, in the case of a directory, the meta data caninclude a name, path, creation date or time, modification date or time,deletion date or time, permissions, corresponding markers (e.g., icons),etc., and the content can include the files or directories containedwithin the directory. In the case of a file, the meta data can includeall of the meta data described above for a directory and also includeapplication affiliation (e.g., what application created the file, whatapplication can read the file, etc.), and the content can include thebits of the file. Examples of the data store 115 can include a filesystem, a database, or other storage mechanisms in which file elementcontent and file element meta data can be stored.

The data distribution mechanism 110 can be coupled to or integrated withthe data store 115 when in operation. In an example, the communicationsbetween the data distribution mechanism 110 and the data store 115 caninclude notification of file system element events (e.g., create, read,write, delete, move, rename, etc.). In an example, this notification canbe effectuated by monitoring file system events. For example, changesmade by applications on the node 105 to data on the data store 115causes a notification to be sent by a file system of data store 115 tothe data distribution mechanism 110. In an example, the datadistribution mechanism may monitor to detect changes in the data store115 made by applications over time (e.g., comparing blocks of file A attime T1 with the blocks of file A at time T2).

In the previous examples, changes are detected to data on the data store115 by interfacing with the file system. In other examples, applicationsmay directly request file system element actions (e.g., create, read,write, move, rename, etc.) from the data distribution mechanism 110. Inthese examples, the coupling or integration with the data distributionmechanism can include access to a storage medium underlying the datastore 115. For example, the data store 115 can be a file system in whichthe data is stored to a hard disk drive. In this example, the datadistribution mechanism 110 can analyze the blocks stored on the harddrive without using the file system.

The data distribution mechanism 110 can also be coupled to another Node140, for example, to the data distribution mechanism 145 on the Node140, when in operation. In an example, the coupling or integration withthe data distribution mechanism can occur via any available networkconnection, such as a wired or wireless connection. In an example, thecoupling or integration with the data distribution mechanism furtherincludes an encrypted tunnel operating over a physical link between theNode 105 and the Node 140. The encrypted tunnel can include use ofasymmetric or symmetric encryption, for example using 256-bit AEStransport security. In the example of symmetric encryption, the key canbe shared between the Node 105 and the Node 140 via, for example, theAuthority 120. In the example of asymmetric encryption, standard publickey cryptographic techniques can be used. As used herein, the physicallink can include physical layers, media access layers, or applicationlayers below the application layer for the data distribution mechanism.Thus, a Transmission Control Protocol/Internet Protocol (TCP/IP)connection over Ethernet can be considered a physical link.

In an example, the coupling can include two logical channels, a datachannel 130 and an event channel 135, operating over the same physicallink between the two Nodes 105 and 140. In an example, the data channel130 and the event channel 135 can operate over different and distinctphysical links. For example, the event channel 135 may be better servedby an always-on physical link whereas the data channel 130 may be betterserved by a less expensive (e.g., per byte transferred) or faster link.Thus, in the example of a mobile device, the event channel 135 can use acellular radio physical link while the data channel 130 can berestricted to a local area network physical link.

The event channel 135 can pass file system element events from the firstdata distribution mechanism 110 to the second data distributionmechanism 145 or vice versa. File system element events can include anyone or more of the following: creation, modification, deletion, ormovement (e.g., moving a file from one file system location to another).In an example, only a portion of the data store 115 can be managed bythe data distribution mechanism 110 for distribution to Node 140. Inthis example, file system element events can include the inclusion orexclusion of a file system element from the portion of the data store115.

As used throughout, this portion of the data store 115 that is managedby the data distribution mechanism 110 for distribution to other nodesis known as a collection (e.g., file system element collection or“plan”). In an example, a collection can be one of a plurality ofcollection types. Example collection types can include a multi-usercollection, a personal collection, or a backup collection. A multi-usercollection is a collection of file system elements that may be sharedand synchronized between different users of the data distributionmechanism. A personal collection may be a collection of file systemelements that may be restricted to a single user of the datadistribution mechanism but that may be shared across file distributionmechanisms operated by the single user and executing on different nodes.A backup collection may be a collection of file system elements that maybe backed up to versioned storage by the data distribution mechanism.For brevity, the examples discussed below use the term “collection” torefer to a multi-user collection unless otherwise noted.

In an example, the Node 105 can include a plurality of collections. Inan example, the data distribution mechanism 110 can manage one or morecollections from the plurality of collections. The respectivecollections of the plurality of collections managed by the datadistribution mechanism 110 may be different collection types. Thus, forexample, the data distribution mechanism 110 may simultaneously manageone or more multi-user collections, personal collections, and backupcollections. In another example, the data distribution mechanism 110manages a single collection from the plurality of collections. Thus,each of a plurality of data distribution mechanisms on the Node 105respectively manages a single collection of the plurality of collectionswith data in the data store 115.

The event channel 135 can also pass non-file system element eventsbetween the two Nodes 105 and 140. Example non-file system elementevents can include requests to another node (e.g., the Node 140),lifecycle notifications (e.g., that Node 105 is up, going down, etc.),or other events corresponding to the data distribution mechanism 110that are not directly tied to a file system element. In an example,events can be organized on the Nodes 105 and 140 as a stream, orcollection of events. In an example, the event channel 135 can be usedto synchronize event streams between the two Nodes 105 and 140. In anexample, the event channel 135 is specific to a collection. That is,there is a separate event channel 135 for each collection in exampleswhere the Node 105 includes a plurality of collections. In this example,all events on the event channel 135 correspond to the collection withoutthe need to specifically label each event with a collection affiliation(e.g., to which collection an event pertains). In an example, the eventchannel 135 is shared by a plurality of collections. In this example,the events can be marked with the collection to which the event applies.In an example, an event can apply to more than one collection and thuscan include a list of collections to which the event applies.

The data channel 130 can pass data between the Nodes 105 and 140 notpassed by the event channel. For example, the data channel 130 can passblock data to be stored in a data store (e.g., data stores 115 or 150),but the data channel 130 does not pass event data used to manage orcontrol data distribution activities for the collection via the datadistribution mechanism. For example, a notification that a file haschanged and what has changed on the file is event data, whereas the bitsconstituting the new material in the file are file system element data.Thus, the data distribution mechanism 110 can notify the datadistribution mechanism 145 that a file X has been created in the datastore 115 via the event channel 135. The data distribution mechanism 145can receive this event and request the contents of the file X from thedata distribution mechanism 110. The data distribution mechanism 110 cantransfer the contents of the file X to the data distribution mechanism145 via the data channel 130 and store the received contents in the datastore 150. In an example, the data channel 130 can use differenttransfer characteristics than the event channel 135. For example, datatransfer of file system elements may be less sensitive to latency butmore sensitive to lost bandwidth due to signaling overhead. Thus, thedata channel 130 may collate data into greater transmission packagesthan the event channel 135.

The Authority 120 can be coupled to the Nodes 105 and 140 when inoperation. The coupling can be over any physical link, and can beencrypted, for example, as described above with respect to inter-nodecommunication. In an example, the Nodes 105 and 140 can establish abi-directional link to the Authority 120. Communication between theAuthority 120 and the Nodes 105 and 140 occurs over the logical link125. The logical link 125 can be implemented over one or more physicallinks.

The Authority 120 can manage parameters of the data distribution betweenthe Nodes 105 and 140. The parameters can include which nodes are partof the data distribution, what users are part of the data distribution,user permissions, encryption standards, protocols, shared information(e.g., encryption keys), what data is distributed, etc. The Authority120 can manage these parameters by providing interfaces to create,update, and delete these parameters. In an example, the interfaces caninclude application programming interfaces (APIs). Thus, the Authority120 is responsible for provisioning members of the collection. In anexample, the interfaces can include user interfaces.

The Authority 120 can maintain one or more data structures to store andmanage the data distribution parameters. The Authority 120 cancommunicate a subset of the data distribution parameters to the Nodes105 and 140 to permit those nodes to participate in the datadistribution. As described above, a collection is a portion of data in adata store (e.g., the data store 115) that is distributed. In thisexample, the data distribution parameters are known as a collectionschema (e.g., plan). The collection schema includes parameters thatdefine the file system elements in the collection that are to bedistributed, as well as management information for the datadistribution. In an example, all collection schema changes are handledby the Authority 120. For example, if a user adds or associates a filesystem element to a collection, a request can be made from the datadistribution mechanism 110 to the Authority 120 to add the file systemelement to the collection. The Authority 120 can modify thecorresponding collection schema to include the file system element. TheAuthority 120 can communicate to the data distribution mechanisms 110and 145 that the file system element is now part of the collection. Thedata distribution mechanism 145 can request, for example via the eventchannel 135, the meta data or content of the new file system elementfrom the data distribution mechanism 110. The request can be satisfiedby a data communication from the data store 115 by data distributionmechanism 110 via the data channel 130.

FIG. 2 illustrates a relationship diagram 200 of a data distributionmechanism collection schema (e.g., collection schema or plan schema).The collection schema 205 can be a single data structure or a set ofdata structures with corresponding relationship data (e.g., amany-to-many table correlating a collection information 255 (such as acollection ID or “CID”) with participant users 230). However, as usedherein, the collection schema 205 provides a definition of thecollection including file system element members 210, participant nodes220 (e.g., collection devices), participant users 230, and generalcollection information 240.

The file system element members 210 include entries 215A, 215Bcorresponding to individual file system elements that are part of thecollection. Each entry, such as entry 215A, corresponds to a single filesystem element and includes information pertaining to that file systemelement. For example, information in the entry 215A can include a filesystem element identification (e.g., FID), and a list of local rootpaths. Each local root path in the entry 215A corresponds to a filesystem path of the file system element on a particular nodeparticipating in the collection. In an example, an unspecified localroot path can be overridden by to a default value for a local rootdirectory. This default value can be referred to as a “landing zone.” Inan example, the default value for the local root path can be specific toa node type. In an example, node types can be differentiated by a devicetype (e.g., a mobile phone vs. a data center server) or an operatingsystem (e.g., a tablet operating system vs. a sophisticated multi-useroperating system). In an example, a collection can include a restrictionon local root directories of the file system element members. In anexample, the restriction specifies that the local root directory must bethe landing zone.

In an example, the entry 215A can also include file system element metadata with respect to the collection. This file system element meta datamay indicate information such as when the file system element was addedto the collection, the user responsible adding the file system element,the last time the file system element was modified, the last time thefile system element was distributed to this node or from this node,information on what version of multiple versions the file element is,etc.

The participant nodes 220 include entries 225A, 225B, 225C correspondingto individual nodes. Each entry, such as entry 225A, includes a nodeidentification (e.g., NID) to uniquely identify one node from another.The entry 225A can also include connectivity information about the node.In an example, the connectivity information can include a routableaddress that can be used to reach the node of the entry 225A. Forexample, if the node can be connected to from the internet (e.g., if thenode has an internet routable internet protocol (IP) address), thataddress can be included in the connectivity information. In an example,the connectivity information can include a connected status (e.g., nodeis connected, node is connected but unavailable, etc.). In an example,the connectivity information can include connection quality information,such as latency or bandwidth metrics, the operating state of the node(e.g., a laptop on battery power), or processing capabilities of thenode (e.g., high available storage input-output (IO) of the node). In anexample, the connectivity information can be a connectivity scoresupplied by the node of the entry 225A.

The participant users 230 include entries 235A, 235B corresponding toindividual entities to which collection permissions can be assigned.Thus, participant users need not correspond to a specific person or evengroup of people, but can include permissions assigned to a third partydata consumer (e.g., an auditing enterprise or a social network). In anexample, the nodes in the collection of nodes 220 are assigned to thecollection via a user entry. For example, when a node connects to anAuthority, the node authenticates using credentials in a user entry.Thus, the permissions of the particular user entry (e.g., the user entry235A) accrue to the node.

The user entry 235A can include a user identification (e.g., UID),collection permissions, node permissions, file system elementpermissions, or an activity log. The collection permissions are generalto the collection as a whole, such as granting other users permissionsto the collection, updating collection parameters, etc. Node permissionscan be specific to nodes that are part of the collection but, forexample, not under the control of the user. Node permissions can includepermissions for operations such as a remote wipe (e.g., forcible deleteof remote node), local file overwrite (e.g., forcing the overwriting ofa file system element on the remote node), etc. The file system elementpermissions can include read or write permissions to individual filesystem element members 210. The activity log can store the activity ofthe user. In an example, the activity log is composed of event streamevents attributable to the user. In this fashion, the participants inthe collection can establish various permissions for the collection,including implementing permission categories of the collection such asguest (read-only permission to file system elements), contributor(read/write permission to file system elements), and administrator(complete control over file system elements and the collection). In anexample, a user who invites another user can specify the permissionlevel to provide to the another user.

The collection can also include general collection information (e.g.,meta data) 240. The general collection information 240 serves as arepository for single items with applicability to the collection. Forexample, the collection data can be encrypted, such as with the use of256-bit AES data encryption. The encryption data 245 can include keys(symmetric or asymmetric) to encrypt or decrypt the data. The encryptiondata 245 can include or otherwise indicate acceptable encryptionprotocols. The Authority entry 250 can be used when a central Authoritymanages the collection or the collection schema 205. The Authority entry250 can identify or otherwise indicate the one or more machines housingan Authority. The collection identification 255 (e.g., CID) can uniquelyidentify the collection among other collections present on the nodes ormanaged by the Authority. The name entry 260 can provide a name for thecollection. This can be useful to allow persons to identify collectionswithout resorting to collection IDs. The general collection information240 can also include one or more entries for other meta data 265. Theother meta data can include such information as when the collection wascreated, who or what created the collection, whether the collection isactive, inactive, archived, etc.

FIG. 3 illustrates an example of a hierarchy-based file system 300 withmultiple collections, the collection 310 and the collection 315. Thefile system 300 is depicted as being hosted on a single node (e.g., theNode 105 described above with respect to FIG. 1). However, in clusteredcomputing environments, the node can include a plurality of physicalmachines and storage devices that manage and store the file system 300(e.g., a distributed or clustered file system). As illustrated, filesystem element 305 is the single root of the file system 300. Thecollection 310 includes two file system element members, file systemelement 320 and file system element 330. The contents of the file systemelement member 320 include the file system elements below the filesystem element member 320 in the file system 300 hierarchy asillustrated by the collection 310 outline excluding file system elementmember 330. The same is true of file system element member 325 of thecollection 315. The inclusion of the file system element member 330 inthe collection 310 removes the file system element member 330 from thecontents of the file system element member 325 even though file systemelement member 330 is within the file system hierarchy rooted at filesystem element member 325. Accordingly, a file system hierarchy candefine the contents of a file system element (e.g., a directory) untilsuch hierarchy conflicts with the specific inclusion of a file systemelement as a file system element member of a different collection.

FIG. 4 illustrates an example of a file system 400 including thecollection 310 (described previously in FIG. 3), the file system 400being on a different node (e.g., the Node 140 described above withrespect to FIG. 1) than the node hosting the file system 300 describedwith respect to FIG. 3. The file system element 410 is a root node forthe collection 310 in the file system 400. This hierarchy, whencontrasted to that of the file system 300, illustrates that the localroot for a file system element member of a collection may vary betweennodes. Further, the change in hierarchical position of the file systemelement member 330 between the file system 300 and the file system 400illustrates the flexibility of having a local root node attribute thatspecifies the local root for each element in the collection for eachnode (e.g., see FIG. 2, 215A-215B) in the collection schema. Thecontents of the file system element members 320 has not changed, asillustrated in the preserved hierarchy of file system elements in thefile system element 320 between the file system 300 and 400. Thus, filesystem element members of the presently described collections providegreat flexibility in managing file system elements than maintaining asingle hierarchy. Further, the file system element members of thepresently described collections also relieve the system of the need formanaging every file system element in a collection at the collectionlevel.

While FIGS. 3 and 4 illustrate movement of a file system element fromone root on a first node to another root on a second node using thelocal root data for a collection, a scenario may arise in which thelocal root data is out-of-sync with the collection. For example, a usercan use file system tools on the node of FIG. 3 to move the file systemelement 320 to root other than root file system element 305 while thedata distribution mechanism is unable to notice this change at the time(e.g., the data distribution mechanism was disabled or off). In thisscenario, when the data distribution mechanism is able, it can verifythe local root data for the node and determine that the file systemelement 320 is no longer present at the specified local root. The datadistribution mechanism may then search for the file system element 320and, if found, update the local root data to reflect the new local rootfor the file system element 320.

In an example, to aid the in the search, the file system element 320 canbe tagged to allow it to be identified if found. For example, when thefile system element is a directory, a file (e.g., a hidden file) can beplaced in the directory that specifies the collection affiliation of thefile system element 320. In an example, such as with file systems thatpermit labels, tags, or other meta data, the collection affiliation canbe placed in such meta data. This technique can work equally well withboth file and directory file system elements. In an example, specificfile types can include flexible meta data that can accept the collectionaffiliation, thus, the collection affiliation can be placed in this metadata.

In an example, for identification purposes, the file system element canbe assigned an identifier computed from the file system element itself.Thus, the identifier can be stored in a database of the datadistribution mechanism, and each file system element checked can haveits identifier computed and verified against the database. For example,a hash of the file meta data, contents, etc., can be used as anidentifier for the file system element and stored in the database. Then,during the search of the file system, each encountered file systemelement can be similarly hashed and compared against the databaserecords. If the hashes match, the local root data for that file systemelement can be updated to reflect its new location.

FIG. 5A illustrates an example of a participant Node 505A. The Node 505Acan include both Collection A event data 510 and Collection B event data515. That is, the Node 505A is a member of two collections andparticipates in the event stream of these collections. The Node 505A canalso include a local data store 520A. In an example, the local datastore 520A is a file system of the Node 505A. The local data store 520Acan store the contents of file system elements that are part of one orboth collections A or B.

In an example, the contents of file system elements are managed at ablock level. That is, instead of managing the contents of file systemelements as a single entity, the contents of file system elements may bedivided into blocks for data distribution (and in some cases, datastorage). In an example, the blocks are variable length. In an example,the Node 505A may include a block index 525 to manage the blocksindependently from the file system elements of the collections.Individual block management may facilitate a reduction in data thatneeds to be transmitted on the data channel when transferring, forexample, files between two nodes. For example, a first node may createan event indicating that file Xin Collection A was created. In anexample, the file creation event can include the blocks that constitutefile X. In an example, the file creation event may refrain fromincluding the blocks that constitute file X and instead provide noticeof those blocks in a separate event. The Node 505A may have access tothe file creation event and the constituent block notification for fileXin the Collection A event data 510 (e.g., via a synchronized eventstream with the first node). In an example, the Node 510A may act on thereceived event to create a local representation of the file Xin thelocal data store 520A. In the process of creating the local file X, theNode 505A may reference the block index 525 to determine if the localdata store 520A already has a copy of one or more constituent blocks ofthe file X. For example, if the same file, e.g., an image, exists inboth Collections A and B, the Collection B data in the local data store520A may include those blocks. Thus, the Node 505A can create the localcopy of the file Xby copying the blocks from a Collection B file systemelement as opposed to transferring those blocks from the first node(e.g., over the data channel).

FIG. 5B illustrates an example of a participant Node 505B. The Node 505Bdiffers from the Node 505A in that the local data store 520B is limited.A limited data store is a data store that is restricted from holding thecontents of every file system element member of the collections to whichthe node is a member. In an example, the data store can be restrictedvia a lack of available storage capacity or via user defined quota, andthe like. This is typically the case with mobile computing devices, suchas cellular telephones, tablets, etc., which typically includesignificantly less storage than other classes of computers (e.g.,desktop or server machines). However, a desktop with great storagecapacity in the local data store 520B would be considered a limited datastore if such desktop is also restricted from holding the contents ofevery file system element member of the collection to which the desktopis a member. In an example, the set of elements of the collection forwhich the node stores the contents may be referred to as the “locallyavailable” set of elements. In the case of a node, such as Node 505A,the locally available set may be the entire collection. In the case ofNode 505B with a limited data store, the locally available set may be asubset of the entire collection. The selection of elements to include inthe locally available set of elements will be discussed later withrespect to FIGS. 25-30.

FIG. 6 is a network communication diagram of an example datadistribution system 600. The data distribution system 600 includes fournodes, Nodes A 620, B, 625, C 640, and D 635. Also illustrated, areCollection A 605 and Collection B 610. Nodes A 620, B 625, and C 640 areparticipants in the Collection A 605. Nodes A 620, D 635, and C 640 areparticipants in the Collection B 610. The Authority 615 is illustratedoutside of either Collection A 605 or B 610. The Authority 615 iscommunicatively coupled to every participant node, illustrated by thethin lines (e.g., line 650). As described above with respect to FIG. 1,the Authority 615 manages schemas for the collections A 605 and B 610.As part of this management, the Authority 615 authenticates the nodes(e.g., via a participant user of the schema) and tracks nodeavailability for a given collection. The Authority 615 also providesportions of the collections' schemas to the nodes (e.g., to a datadistribution mechanism on the nodes). These portions can include filesystem element members, general information (e.g., encryption data), orparticipant nodes (as described above with respect to FIG. 2.)

The thick lines (e.g., line 645) illustrate inter-node communicationwithin a collection. The inter-node communication can include an eventchannel. In an example, the inter-node communication can also include adata channel. For example, Node B 625 can connect to Collection A 605participant Nodes A 620 and C 640 after receiving notice that Nodes A620 and C 640 are participants to the Collection A 605 (e.g., via theAuthority 615 transmitting the collection schema for the Collection A tothe Node B 625). In an example, the Authority 615 does not participatein providing events or data to the connected nodes via inter-nodecommunications. However, in the example illustrated in FIG. 6, the NodeB 625 can establish an inter-node communication to the Authority 615.The inter-node communication to the Authority 615 can facilitate, forexample, event sharing from Node B 625 to Node C 640 in a circumstancewhere direct node-to-node communication is infeasible or impossible. (Asfurther detailed in FIG. 11 below, a store point node, and not theauthority, may facilitate event and data flow in such circumstanceswhere direct node-to-node communication is not available betweenendpoint nodes.)

In an example, establishing inter-node communication includes connectingfrom the Node B 625 to Node A 620 using a physical link. Informationenabling the connection (e.g., protocol, address, authenticationinformation, etc.) can be obtained from the collection schema for theCollection A 605. In an example, a node connects to every otherparticipant node in the collection that the node can reach. In anexample, a node connects to a subset of participant nodes to thecollection, the subset specified in the collection schema. In anexample, participant nodes in the collection schema can includeconnection metrics used to order or prefer nodes for connections.Connection metrics can include measurements of bandwidth, latency, costof the connection (e.g., bytes transferred, etc.), storage capacity,security (e.g., has a virus, is secured below a threshold, etc.), power(e.g., battery power, mains power, intermittent power such as wind orsolar, etc.), among other things. Thus, for example, if Node A 620 isrunning on battery power and has a high-cost physical connection (e.g.,a metered cellular physical link), and Node C 640 is a server machine onthe same local area network as Node B 625, and Node B 625 is restrictedto connecting to a single other node, Node B 625 can choose (or bedirected by the Authority 615) to connect to Node C 640. Additionalexamples of inter-node connections are discussed below with respect toFIG. 9.

As noted above, a part of the inter-node communication is an eventchannel. In an example, part of establishing the inter-nodecommunication, after the physical link is established, is synchronizingthe event channel of the connecting node (e.g., Node B 625) to everyconnected node (e.g., Nodes A 620 and C 640). As events are shared, eachnode maintains some subset of events that the node has received. In anexample, the subset is the entirety of all received events. In anexample, the subset is the result of trimming (e.g., discarding ordisregarding a portion of) the received events. In an example, thetrimming is based on time. Thus, old events can be discarded. In anexample, the trimming can be based on a superseding event. For example,a first event indicating modification of a file X from version one toversion two can be superseded, and thus trimmed, when a subsequentsecond event indicates modification of the file from version two toversion three. A superseded event has no information relevant to acurrent version of a file system element that is not included in asubsequent event. Thus, if the changed contents of a file system elementare indicated incrementally (e.g., only the blocks that change from oneversion to the next are indicated as opposed to all blocks that make upthe current version), then it is less likely that a modification eventwill be superseded by a later event.

In an example, version vectors are used to perform event trimming. Theversion at issue is a version of an object to which the event pertains,such as a file system element. The following includes example scenariosand uses of versioned vectors for event synchronization and eventtrimming (e.g., pruning).

Version vectors can be used to determine causality withinpartial-ordered events that occur in a distributed system withoptimistic replication. Optimistic replication means that clients canmake their own updates without first getting permission from a centralsource or verifying with all other clients. Partial ordering means thatsome events occur sequentially but others may occur concurrently, andhave no set order based on a previous event. Causality says that oneevent not only succeeded another, but it was also occurred as aconsequence of the previous event, it has a direct relationship. All ofthis means that given a file with 2 updates A and B, we can determine ifA was based on B, B was based on A, or they are unrelated.

Version vectors can be represented as an associative array or map. Eachnode has an entry in the vector with a key that identifies the node anda value that describes their version. The version value can be atimestamp or counter. In this example, it is a counter, incremented sothat the new version of the file is one greater than the maximum versionin the existing vector held by the node doing the incrementing.

As an example, consider nodes P1 and P2 with identifiers 123 and 456. P1creates a file and sets is initial version vector=[123,1]. P2 gets thefile, and makes an update, the version vector is now [123,1], [456,2].P1 receives the update from P2 and makes a further change, updating theversion vector to [123,3],[456,2].

Causality is established when one vector dominates another. A vectorV_(x) is said to dominate V_(y) when V_(x<y<x<)V_(y)( ) then V_(y)causally succeeds V_(x). If neither vector dominates the other then theyare causally concurrent, and potentially in conflict.

Consider the previous example, when P1 receives the update from P2. Itcompares each element in the incoming vector with its own to determinedominance:

P2[123,1]=P1[123,1]

P2[456,2]>P1[456,0] (assume value zero if no local entry for a node)

P2>=P1 across all vector elements, so P2 is dominant; it causallysucceeds P1's update.

To demonstrate concurrent updates, continue the previous example whereP1 and P2 share the same version [123,3],[456,2]. Now both make anupdate and send each other the updated version vector, but neithervector will dominate:

P1[123,4],[456,2]≠P2[123,3],[456,4]

Version vectors allow the determination of causality given any arbitraryobject updates. Version vectors also allow us to detect when aconcurrent (conflicting) update has occurred.

As used in the discussed collection framework: in an example, each nodecan use its GUID as the identifier for its element in the vector; in anexample, only nodes that have contributed an update to an object willhave an element in the version vector; and in an example, a node onlyincrements the version vector for an object when it publishes the updateto the collection, not necessarily when file changes are detectedlocally.

Vector pruning (e.g., trimming) may occur according to an example. Asmore nodes contribute to an object, the size of its version vectorgrows. In the vast majority of cases an object typically only has asmall set of contributors(writers), most peers are just readers, and sothe size of the version vector is not an issue. A bad case scenario caninvolve a large corporate collection that includes every employee (suchas a human resources collection with 250,000+ contributors), who haveeach contributed a single edit to an object. Even though most peoplewill never contribute again to this file, they are forever carried alongin the version vector.

Given a size for each element in a vector as 12 bytes, an 8 byte GUIDand 4 byte version value, in the HR collection scenario, the versionvector that was carried along with each object change/update wouldrequire 2.86 MB. In an example, the premise that causality can bedetermined using only the vector elements from recent contributors—e.g.,assuming that after sometime, every node's vector reflects the samevalue for old entries, so only the recent ones would differ—each elementin the vector can include a timestamp of when the element was lastmodified. The decision to prune a vector element can be determined bytwo configurable attributes: the size of the vector and the age of theelement. Each of these attributes is a range. When a vector is at leastlarge enough, and an element is at least old enough, the element isremoved from the vector. An issue that can arise involves two nodes thatare exchanging version vectors and they have not been equally pruned.This may lead to a false reporting of conflicts. As an example, considertwo nodes that initially have the same version vector, [123,101],[456,98], [789,2]. P1 makes an update [123,102], but also detects thatit is time to prune the aged element for 789, and then transmits itschanges to P2. When P2 compares the vectors, it finds that P1 is missingan element for 789, and so will use a default value of 0:

P2[123,101],[456,98],[789,2]≠P1[123,102],[456,98],[789,0]

The vector comparison will be incorrectly detected as concurrent. If P1had sent the actual vector element for 789, P2 would have correctlydetermined that its vector was dominated by P1.

Virtual pruning can be used to address the unequally pruned vectorproblem mentioned above. With virtual pruning, a vector comparison isdone only between the common elements in the two version vectors onlywhen it is deemed likely that any differences are due to unsynchronizedpruning With virtual pruning P2 would have detected that the element for789 had been pruned by P1, and would not have considered it in itsvector comparison, correctly determining thatP2[123,101],[456,98]<P1[123,102], [456,98].

In operation, with virtual pruning, when a node compares two versionvectors, it first will identify any elements that exist in its ownvector but are not present in the other vector. For each of theseelements the node looks to see if any are eligible to be pruned. Whenreconciling using virtual pruning some slop time is added to theelements' age to compensate for clock sync issues. Next, the node doesthe opposite operation to find any elements from the other vector thatare missing from the local vector. Again it determines if any of theseelements should have been pruned by the other node. Any eligibleelements are virtually pruned in each vector, and the final vectors arethen compared.

Virtual pruning can greatly reduce false conflicts. In an example, aCollection Time (e.g., synchronized clock across collectionparticipants) can allow greater reliance on virtual pruning as it can beassumed that the clocks to of two nodes are loosely synchronized,provided time-based operations are based on Collection Time in eachnode.

In an example, detected conflicts can be resolved using alast-write-wins rule where the object contents winning object changeremains the same whether a conflict is detected or not. The detection ofthe conflict may incur additional storage costs locally, as the losingfile can be copied elsewhere to prevent data loss.

Revisiting the example above using virtual pruning, assume values ofvector max size>50 and element age>20 sec. Because an 8 byte timestampis now included, the size of each vector element is 20 bytes. If therewere a sudden flurry of edits by every member, the vector could reach alarge size of 4.77 MB, and pass the size threshold of 50. But for thenext 20 seconds the element age threshold would not be met. After 20seconds of system degradation, pruning would take effect and the vectorwould now contain the latest 50 entries, with a vector size=1000 bytes.After 24 hours the max age range is passed, and the vector is pruneddown to its smallest size (10), yielding a normalized size=200 bytes.

Event stream synchronization can be based on an event sequence, forexample, that is globally, or collection wide, unique. Thus, Node B 625can connect to Node C 640 and request Node C's 640 last event sequence.If the last event sequence matches its counterpart in Node B's 625 eventstream, then Node B 625 can determine that the event streams aresynchronized. Similarly, if the last event sequence of Node C 640indicates an earlier event than the last event sequence of Node B 625,Node B 625 transmits the events between the last event sequence of NodeC 640 and the last event sequence of Node B 625.

After event stream synchronization, the nodes (e.g., Nodes A 620, B 625,and C 640 for Collection A 605) can continue to pass events to eachother via the inter-node communication. As described above with respectto FIG. 1, these events can include file system element events as wellas data distribution element events. In an example, every eventgenerated by a given node (e.g., Node B 625) is transmitted to everyconnected node in the collection (e.g., Collection A 605). In anexample, an event can be specifically addressed to a particular node.For example, although Node B 625 is connected to both Nodes A 620 and C640, Node B 625 addressed the event to Node C 640. In an example,addressing the event can include only sending the event to the desirednode. In an example, the event can include a destination identifier onthe event when the event is communicated to one or more connected nodes.This may be useful when, for example, the destination node is notdirectly connected to the source node, but there is a connection path(e.g., including an intervening node) between the two nodes.Specifically addressed events allow for specific requests to be madefrom nodes uniquely suited to handle the request while avoidingtransmission to nodes not suited to the request. For example, Node C 640can include multiple versions of file system elements for Collection A605 while Nodes A 620 and B 625 include only the latest versions ofthese file system elements. Thus, if Node B 625 wishes to restore afile, for example, to a previous version, Node A 620 cannot satisfy therequest while Node C 640 can.

As noted above, the inter-node communication can include a data channel.As described above with respect to FIG. 1, in an example, the datachannel can be differentiated from the event channel by its optimizationfor data transfers. Thus, while control information is passed in theevent channel, the data channel operates to push raw or bulk data, suchas file system element content data, between nodes. For example, anevent can include a current version indicator of a file with a listingof blocks making up that file. The listing can include blockidentifiers, but not the actual blocks themselves. An example identifiercan be all or part (e.g., the most significant half of bits) of a hashvalue (such as an MD5 checksum). The identifiers can be used todetermine which blocks are available locally and which blocks need to beobtained to reconstruct the entirety of the file. An event can be sentto identify or request these blocks. When the blocks are transmitted,they are transmitted via the data channel. The data distribution systemthus is a two-step event-based system. Rather than a push model, where acentral server pushes data (e.g., files or changes to files) toparticipants, nodes may exchange events and each individual node mayrequest (e.g., pull) the data corresponding to the events.

In an example, every event is acted on, or executed, when the event isreceived by a node. In an example, the node can defer activation of allor some events (e.g., file system element events). Such a node can bereferred to as a passive node. A passive node has a synchronized eventstream, but not necessarily a synchronized local data store. Such anarrangement can be useful in nodes with a limited local data store, suchas that described above with respect to FIG. 5B. A node that neitherreceives nor acts on events can be termed an inactive node if such nodewas ever a participant node in the collection schema.

As illustrated in FIG. 6, inter-node communication can be specific to acollection. Thus, the nodes in Collection A 605 establish inter-nodecommunication between each other and the nodes in Collection B 610establish inter-node communication between each other. However, nodescan be participants in more than one collection, as illustrated in NodesA 620 and C 640. In an example, the inter-node communication betweenthese nodes can share a physical link. Thus, Node A 620 can establishCollection A 605 inter-node communication and Collection B inter-nodecommunication to Node C 640 over a single physical link (e.g., the sameTCP/IP connection).

As illustrated in FIG. 6, nodes communicate meta data throughoutparticipant nodes in a collection. By engaging in this distributed metadata sharing about the collection data, including file system elementdata, the data distribution can be resilient to disruption, and alsooffer timely sharing of data between the participant nodes. Managing thecollection schema from the Authority 615 can also address concerns formanaging the distributed design of the collection without overlyburdening participating nodes, thus allowing more nodes to participatein the data distribution capabilities of the collections. Thus, a robustand efficient data distribution mechanism can be implemented using thecommunication network illustrated in FIG. 6.

FIG. 7 illustrates an example implementation of a participant nodenetwork 700. Implementing the general network data distribution system600 discussed above with respect to FIG. 6, the network 700 can includeNodes 705, 710, 715, and 720 with respective local data stores 725, 730,735, and 740. Nodes 705, 710, and 720 are participants to Collection A,with Collection A data distributed between them. Nodes 705, 715, and 720are participants to Collection B, with Collection B data distributedbetween them. The Collection A and B data is stored in local datastores—local data stores 725, 730, 735, and 740—of the nodes. Asillustrated, the solid connecting lines are Collection A inter-nodecommunication while the dashed connecting lines are Collection Binter-node communication. All of the illustrated inter-nodecommunications include both events and data (e.g., an event channel anda data channel). While all of Collection A's participant nodes are interconnected, FIG. 7 illustrates a case in which Node 705 is not directlyconnected to Node 715 although both are Collection B participants. Inthis case, events and data (unless Node 720 is passive) originating fromthe Node 715 will be synchronized with Node 720. Because Node 705maintains its synchronization with Node 720, the changes originatingfrom Node 715 will propagate to Node 705 via Node 720. In an example, anode (e.g., Node 720) can rebroadcast all events received to every otherconnected node within the collection. This example allows every node toact as a network repeater for the inter-node communication.

FIG. 8 illustrates an example implementation of a participant nodenetwork 800 with a multi-segment physical network. As illustrated, allnodes are participants of a single collection. The shaded networkconnections (e.g., connection 865) represent event only communication(e.g., passive participant communication) and the unshaded connections(e.g., connection 860) represent event and data communication. Thenetwork 800 includes four network segments, a local area network (LAN)805, a cloud network 810, a remote site 820, and a wide area network(WAN) 815. Thus, with respect to LAN 805, Nodes 825, 830, and 835 can beconsidered local nodes, node 840 can be considered a cloud node, andboth of nodes 850 and 845 can be considered remote nodes. The Authority855 is included for completeness, occupying in some examples centralposition within the topology. The Authority 855 may operate as describedabove.

The network 800 illustrates a scenario in which nodes in differentnetwork segments can experience limited abilities to connect toparticipant nodes in other network segments. For example, it is typicalto assign local nodes a connectable address specific to the LAN 805.Thus, Node 840 would be unlikely to initiate a connection to Node 825,for example. However, if node 840 were assigned a generally routableaddress, then every other node (e.g., Nodes 825, 830, 835, 850, and 845)would be able to establish the physical link to Node 840. Over thisphysical link, bi-directional inter-node communication can beestablished as discussed above with respect to FIG. 6. Note that, wherepossible, the participant nodes establish connections to multiple otherparticipant nodes, as illustrated by the interconnectedness of the localnodes, Node 825, Node 830, and Node 835. FIG. 8 also illustrates likelypassive nodes, such as tablets (e.g., Node 825) and mobile devices(e.g., Node 845).

As illustrated in FIGS. 6, 7, and 8, participant nodes can establishinter-node communication to as many participant nodes as they are able.However, such interconnectedness can waste computing resources, such aswhen the participant node count is high. The description of FIG. 6 aboveincludes a discussion of using connection metrics to connect to only asubset of participant nodes in a collection. Further, the description ofFIG. 2 includes a description of tracking participant node connectionquality information in a collection schema. FIG. 9 illustrates anexample of different participant node classifications for a node set 900based on connection information. As illustrated, the classifications arein regard to Node A 905 as a target node.

Participant node classifications can include the entire node set 900, anactive participant node subset 945, and a preferred participant nodesubset 950. The entire node set 900 includes all of Nodes A 905, B 910,C 915, D 920, E 925, F 930, G 935, and H 940. The participant nodesubset 945 is a subset of the entire node set 900 and includes Nodes A905, B 910, C 915, D 920, E 925, F 930, and G 935. The preferredparticipant node subset 950 is a subset of the participant node subset945 and includes Nodes A 905, B 910, C 915, D 920, and E 925. Asillustrated, Node A 905 has limited inter-node communications to aconnected subset of the preferred participant node subset 950 includingNodes B 910 and D 920.

The entire node set 900 can be all nodes that have contacted anAuthority serving the collection to which Node A 905 is a participant.In an example, the entire node set 900 is limited to nodes that have, atsome time, authenticated to an Authority for access to the collection.In an example, the entire node subset includes all nodes that areparticipants to the collection, including nodes that are inactive (e.g.,indicated as not participating in an event channel or a data channelwithin the collection), such as Node H 940. The participant node subset945 includes all nodes that are active or passive participants in thecollection. For example, the complete connection paradigm illustrated inFIG. 6 and described above would involve connecting to all of the nodesin the participant node subset 945 (e.g., Nodes A 905, B 910, C 915, D920, E 925, F 930, and G 935).

The preferred participant node subset 950 is selected from among theparticipant node subset 945 based on connectivity characteristics. Thesecharacteristics can include network performance, node cost, or monetarymeasurements, such as cost per byte transferred, latency, bandwidth,power source (e.g., mains power, battery, solar, wind, etc.), powerremaining (in the case of a battery, fuel cell, etc.), processing powerof the node, storage capacity of the node, additional features of thenode (e.g., data versioning, hard site designation, physical location,etc.), proximity to the target node (e.g., Node A 905), among otherthings.

In an example, the connectivity characteristics can be determined by theAuthority and communicated to the target node in the collection schema.In an example, other nodes (e.g., Nodes B 910, C 915, D 920, E 925, F930, G 935, and H 940) report one or more of their connectivitycharacteristics to the Authority, for example, when they connect to theAuthority. The Authority can sort the participant node subset 945 basedon the connectivity characteristics. For example, the Authority canfirst sort the nodes based on the bandwidth measurement. To the extentthat there are ties (e.g., equal weighting), the Authority can break theties based on latency, and break further ties with a third measurement.In an example, each measurement can be weighted and combined to arriveat an overall score used for sorting the nodes. In an example, themeasurements can be normalized to a single value range and combined.

Of the sorted subset of participant nodes 945, the subset of preferredparticipant nodes 950 can be selected via a threshold. For example, athreshold of four can lead the Authority to select the top four nodesfrom the sorted participant node subset 945. In an example, thethreshold is a score threshold. That is, every node, regardless of howmany nodes that may be, whose sort score is above the threshold will beincluded in the subset of preferred participant nodes 950. In anexample, the collection schema not only indicates which nodes are in thesubset of preferred participant nodes 950, but also indicates the scoreof each node. In an example, the portion of the collection schema sentto the target node (e.g., Node 1 905) does not include any nodes otherthan the subset of preferred participant nodes 950.

Once the target node has the subset of preferred participant nodes 950,the target node can decide to which of these nodes that the target nodewill connect with, such as Nodes B 910 and D 920 when Node A 905 is thetarget node. In an example, the decision is based on the sort score inthe collection schema. In an example, the decision can be random. In anexample, the number of nodes to connect to is based on a predeterminedthreshold. In an example, the number of nodes to connect to is based ona resource policy for the target node. In an example, the connectednodes can include a node with a particular service, such as a versioningservice. In an example, if the target node loses a connection, a newnode from the subset of preferred participant nodes is selected. Forexample, if Node A 905 loses its connection to Node D 920, Node A 905can choose to connect to either Node C 915 or Node E 925.

Managing node connections as described above can provide efficient useof resources while still enabling robust and highly efficient datadistribution. Further, the differentiated central management ofselecting the subset of preferred nodes and local management ofselecting connected nodes permits a highly dynamic and flexibleconnection methodology to increase performance or resiliency ininter-node communication within a collection.

FIGS. 10A and 10B illustrate two examples of a participant nodeproviding a versioned data store for collection data, referred to as a“store point” node. Up to this point, collection participant nodes(e.g., nodes that are not the authority) have received equal treatmentin describing how data is distributed among the nodes. Store point nodesdo not operate any differently from this perspective. However, adistinction can be made between store point nodes and endpoint nodes inboth the nature of the local data store and the services offered.

FIG. 10A illustrates an example of a store point node 1005A withCollection A event data 1010 and Collection B event data 1030. Insteadof a traditional file system, such as may be provided from a commonlocal data store in an endpoint node, the store point node 1005Aincludes a versioned data store for each collection to which the storepoint node is a participant, namely versioned data store 1015 andversioned data store 1035. The versioned data stores are configured tostore block index and meta data as well as blocks. For example, theversioned data store 1015 includes block and meta data 1020 and blockstorage 1025 for Collection A and the versioned data store 1035 includesblock and meta data 1040 and block storage 1045 for Collection B.

The versioned data store facilitates file system element versioning anddata storage deduplication by managing file system element meta dataincluding a version of the file system element and its constituentblocks, and storing those blocks. For example, if version one of file Xconsist of blocks M and N, and version two of file X consists of blocksM and O, the versioned data store can store blocks M, N, and O once, andalso track which blocks belong to which version of the file. Thus,unlike a file system, which may already have a block locally for a newlycreated file system element, the block is not copied to the new filesystem element, but rather simply mapped to that element.

The versioned data store can be encrypted, for example, using acollection key. Thus, as illustrated, each collection has its ownversioned data store. Such an arrangement permits the store point tofacilitate data versioning for disparate parties without fear that anyparty can access another's data. However, as illustrated in FIG. 5B, iftwo collections, such as Collection A and Collection C (e.g., storepoint 1005 has Collection C event data (block and meta data 1040)) sharea key, they can share a versioned data store 1015. This situation mayarise, for example, if Collection A is a personal collection andCollection C is a backup collection for a single user. The morecollections that can be safely (e.g., securely) condensed into a singleversioned data store, the more data duplication can occur.

In an example, the collection key can be withheld from the versioneddata store. In this example, the endpoint nodes have the collection keyand encrypt blocks individually before sending to the store point node.All block fingerprints are thus based on the encrypted block. In thismanner, the nodes may exchange blocks and block meta data with a storepoint node that is not under control of a party to the collectionwithout fear that the data will be exposed to the store pointadministrator. In an example, the versioned data store can store fileinformation, such as a unique ID of the file (e.g., unique to thecollection or unique amongst all files), file version, or other metadata that does not reveal information about the file contents. In thisexample, the file name, or path, for example, can be stored in a groupof meta data blocks that are part of the file, and thus encrypted priorto arrival at the versioned data store.

FIG. 11 is a network communication diagram of an example datadistribution system 1100, including both store point nodes (e.g., StorePoints A 1130 and B 1135) and endpoint nodes (e.g., Endpoints A 1120, B1125, and C 1140). As described above in FIG. 6, the thin lines (e.g.,line 1150) are Authority communication channels, and the thick lines(e.g., line 1145) are inter-node communications. Similarly, the nodesbelong to either Collection A 1105 or Collection B 1110. Endpoint A 1120participates in both Collections A 1105 and B 1110, while Endpoint B1125 and Store Point A 1130 only participate in Collection A 1105 andStore Point B 1135 and Endpoint C 1140 only participate in Collection B1110. The communications between nodes and the Authority 1115 andbetween nodes work in the same manner described above with respect toFIG. 6. That is, for standard data distribution, there is no distinctionbetween store point nodes and endpoint nodes. However, as discussedabove, if a node desires versioned data, they must request the versioneddata from a store point node. Thus, for example, if the current versionof File X is the third version, and a user on Endpoint B 1125 desiresthe second version of file X, Endpoint B 1125 can make a request (e.g.,via an event channel) to Store Point A in Collection A 1105 for versiontwo of File X. In an example, the Store Point A 1103 provides the metadata of version two of file X to Endpoint B 1125 in response to therequest. Endpoint B can then determine which blocks of file X versiontwo that Endpoint B already possesses and request the missing blocksfrom Store Point A 1130. In an example, store points further store acomplete event stream. Thus, in this example, the event stream trimmingdiscussed above occurs at endpoint nodes. In other examples, store pointnodes will also perform variations of event stream trimming.

FIG. 12 illustrates an example implementation of a participant nodenetwork 1200 including store point nodes and endpoint nodes.Implementing the network data distribution system 1100 discussed abovewith respect to FIG. 11, the network 1200 can include Endpoint Nodes1205, 1215, and 1225, and Store Point Node 1235. Nodes 1205, 1215, and1235 are participants to Collection A. Nodes 1205, 1225, and 1235 areparticipants to Collection B. The Endpoint Nodes 1205, 1215, and 1230have single local data stores 1210, 1220, and 1230 respectively. Thus,Endpoint Node 1205 stores both Collection A data and Collection B datain the single local data store 1210. In contrast, Store Point Node 1235separate local data stores 1240 and 1245 to respectively storeCollection A data and Collection B data. Further, as discussed abovewith respect to FIG. 11, local data stores 1240 and 1245 are versioneddata stores and thus maintain the necessary meta data and content (e.g.,block data) to maintain file system element versions. As illustrated,the solid connecting lines are Collection A inter-node communicationwhile the dashed connecting lines are Collection B inter-nodecommunication. All of the illustrated inter-node communications includeboth events and data (e.g., an event channel and a data channel).

While all of Collection A's participant nodes may be inter connected,FIG. 12 illustrates the case in which End Point Node 1205 is notdirectly connected to Endpoint Node 1225 although both are Collection Bparticipants. In this case, events and data originating from theEndpoint Node 1225 will be synchronized with Store Point Node 1235.Because Endpoint Node 1205 maintains its synchronization with StorePoint Node 1235, the changes originating from Endpoint Node 1225 willpropagate to Endpoint Node 1205 via Store Point Node 1235. This exampleillustrates that the Store Point Node 1235 operates just like anendpoint node during data distribution. However, where any of theillustrated endpoint nodes are instructed to retrieve historical filesystem element information or data, they would need to make the requestof Store Point Node 1235 because the local data stores 1210, 1220, or1230 do not include versioned data stores.

FIG. 13 illustrates an example implementation of a participant nodenetwork 1300 with endpoint and store point nodes in a multi-segmentphysical network. As illustrated, all nodes are participants of a singlecollection. The shaded network connections (e.g., connection 1395)represent versioned data communication (e.g., requests for historicalfile system element information or data) and the unshaded connections(e.g., connection 1390) represent event and data communication (such asthat described above with respect to FIG. 11). The network 1300 includesthree network segments, a cloud network 1305, a LAN 1310, and a remotesite 1315. The Authority 1370 is included for completeness, occupying alikely central position within the topology at the cloud network 1305.The Authority 1370 operates as described above. The Store Point Node1320 is also illustrated as occupying a likely central position in thecloud network 1305. However, as store points are simply participantswith respect to event and data transmission, additional store pointscould be included in any of the networks, such as the LAN 1310 or remotesite 1315. Moreover, the Store Point Node 1320 could be absent from thecloud network 1305 and place in either of the LAN 1310 or remote site1315.

The Store Point Node 1320 can operate as described above with respect toFIG. 10. As illustrated, the Store Point Node 1320 participates in threecollections, Collection A, Collection B, and Collection C. Thus, theStore Point Node 1320 at least includes Collection A event data 1325,Collection B event data 1350, and Collection C event data 1330. Further,Store Point Node 1320 includes two local data stores, versioned datastructure 1335 and 1355. Versioned data structure 1335 includes filesystem element meta data 1340 and file system element content data 1345.Similarly, versioned data structure 1355 includes file system elementmeta data 1360 and file system element content data 1365. Versioned datastructure 1335 differs from versioned data structure 1355 based on themultiple collection storage of Collections A and C on versioned datastructure 1335. As described above, where policy allows, combiningcollections into a single versioned data store provides greater benefitsto storage space utilization. In this example, the policy may beembodied with use of an encryption key to secure the contents of theversioned data structure 1335 that is common between the Collections Aand C. Such as situation can arise, for example, if both Collections Aand C are personal to a user. For example, Collection A can be backupcollection and Collection C and be a personal synchronizationcollection.

Due to the use of the versioned data structures 1335 and 1355, the StorePoint Node 1320 can perform a service for other participant nodes thatEndpoint Nodes 1375 and 1380 cannot; namely providing historical filesystem element information. Accordingly, the requests for thishistorical data are one way, from Endpoint Nodes 1375 and 1380 to StorePoint Node 1320. In an example, the request can be performed as an event(e.g., via the event channel of inter-node communication). In anexample, the request can be in a separate channel. Store Point Node 1320can respond to the request using the inter-node communication describedabove. Store Point Node 1320 is depicted in FIG. 13 as a single machine,but in some examples, the functionality of the Store Point Node 1320 maybe distributed across several interconnected servers and machines.

Inter-Node Communications

As previously discussed with respect to FIGS. 8-9, nodes in a particularcollection may connect with each other in a peer-to-peer fashion. Theseconnections may be utilized to exchange data (e.g., file contents andsignals) with these other nodes. This node-to-node data exchange allowsfor the data distribution system to operate in a distributed, faulttolerant way. Rather than relying upon a centralized server to fulfillrequests (which is a single point of failure for many data distributionsystems), each node in the data distribution system may connect to oneof many nodes to fulfill the request. Accordingly, each peer can fullyoperate with a connection to 0-n peers, and a centralized server peer isnot required.

These node-to-node connections also allow for the creation of policiesthat promote greater resource efficiency. For example, as discussed inFIG. 8, these connections allow nodes to obtain content and eventslocally, e.g., in the same Local Area Network (LAN), from other nodesrather than connect to the cloud node 840, which may require connectingacross a Wide Area Network (WAN) (e.g., the Internet). In the case of anenterprise environment in which the collection may be shared amongstseveral users on several devices within the same organization, the useof local connections may reduce the impact the collection has on networkgateway computing resources. This is the direct result of less trafficbetween the server nodes (such as Cloud Node 840) and the Local Nodes(e.g., Local Nodes 825, 830, 835, and the like) to fulfill requests forevents and data.

As previously detailed with respect to FIG. 9, nodes in a collection canbe classified into a number of different sets: the entire node set 900,an active participant node subset 945, and a preferred participant nodesubset 950. When a node first joins a collection, the node may obtaininformation describing the entire node set 900. The node may thendiscover which nodes in this set are online and actively participatingin the collection to determine the active participant node subset 945.The set of all the nodes in the collection may be obtained from anauthority (or in other examples, a cloud-based node, or other accessibleparticipant nodes). However, as discussed in the examples below, thenodes may self-determine which other nodes with whom to communicate.

In an example, the node may also obtain a subset of the entire node set900 which is a preferred node set. The preferred node set include nodeswhich have been determined (as described later) to be optimal for thatnode to connect to. The node may then discover which nodes in this setare online and actively participating in the collection to determine apreferred participant node subset. In some examples, the node may onlyconnect with nodes in the preferred participant node subset. In someexamples, the node may also connect with one or more nodes that areactive and online (e.g., participating) and that are not in thepreferred list (but are in the entire node set). For example, if thereare no participating nodes in the preferred list (e.g., they are alloff-line). For ease of understanding, the remaining portions of thissection will use the example in which the node connects only to thosenodes that are part of the preferred participant node subset, but itwill be understood by one of ordinary skill in the art with the benefitof this disclosure that the same techniques discussed herein could beapplied to connections with nodes that are not part of the preferredparticipant node subset.

To determine if a particular node is active and participating, aparticipating node may attempt to establish communications with theparticular node. For example, the nodes may engage in a discoveryprocess. The entire node set and the preferred participant node subsetmay be updated on a regular basis as nodes are added and removed fromthe collection and as nodes power on and off and leave and joinnetworks.

The set of nodes in the preferred node subset may be selected based upona selection algorithm that may factor in one or more nodecharacteristics of nodes of the collection. For example, the preferrednode subset may be selected so that the nodes in the preferred nodesubset may be participant nodes which are in a same network segment(e.g., a same corporate network). This may be determined by a node'sInternet Protocol (IP) addresses. In other examples, other variousmetrics may be utilized. For example, each participant node maycalculate one or more network metrics that describe a bandwidth,reliability, or latency of the network connection between itself and theother participant nodes in the collection and these network metrics maybe utilized to select the preferred set. In an example, the individualcharacteristics of the node may be utilized when determining the optimalset. For example, mobile devices may have limited resources andtherefore may not be very reliable for transfer of events or block data.The preferred set may ensure that a node does not connect with too manymobile devices. Thus a more powerful node with better bandwidth may beplaced in the preferred set instead of a mobile device which may becloser, network-wise. Other characteristics used in determining thepreferred set may include: processing power of a participant node,memory of a participant node, the size of the locally available storageon the participant node (which may operate in connection withanticipatory caching/pre-fetching techniques), remaining battery life ofthe participant node, and the like. These device characteristics may bereported by the participant nodes themselves.

In an example, participant nodes may attempt to connect to all of theparticipant nodes in the preferred participant node set. If participantnode 830 from FIG. 8 has a preferred participant node set {Node 825,Node 835} then the participant node 830 would connect (as shown in FIG.8) to those nodes. In another example, the participant nodes may connectto a subset of the preferred participant node set. For example, aparticipating node may have a limited ability to maintain connectionswith more than a particular number of other nodes. This may be due toprocessing ability, bandwidth, and the like. The subset of the preferredparticipant node set to which the node connects may be selected basedupon the aforementioned node characteristics (e.g., bandwidth, latency,battery, processing power, and the like).

In an example, when retrieving event information or file information, aparticular node may broadcast the request (e.g., a request for an eventhistory, or a request for a file block) to all connected nodes and mayfulfill the request from the first node that replies. In other examples,the node may select a node to fulfill the request from the group ofnodes that responds. The node used to fulfill the request may beselected based upon a node's device characteristics. The use of devicecharacteristics to select a node to fulfill the request may ensure thatthe request is not being filled by a node with a low latency and lowbandwidth. A node with low latency may respond quickly (and thereforemay respond first), but because of its low bandwidth the node mayrequire a large amount of time to fulfill a request for content.

FIG. 14 illustrates a flow diagram 1400 for a method of connecting toother participant nodes. Node A 1405 comes online and sends a connectmessage 1420 to a peer service, for example, a peer service 1415operated by an authority. (In other examples, a peer management orcoordination service may execute or operate on another dedicated server,on a store point server, or on another participant node (e.g., a nodeselected by an election algorithm)). The authority peer service mayfacilitate the node-to-node connection process by responding with a listof nodes in the entire node set for the collection with message 1425. Insome examples, the authority peer service 1415 may calculate a set ofpreferred nodes for Node A 1405 to connect with. This preferred set ofnodes may be sent to Node A 1405 with message 1430. In some examples,messages 1425 and 1430 may be combined. Once Node A 1405 receives thelist of entire node sets, Node A 1405 may attempt to discover whichnodes are participating (e.g., which nodes are online). Node A 1405 maydo this by sending a peer broadcast message 1432 to the other nodes.These messages may be sent to the IP addresses indicated for the nodesin the messages 1425 and 1430. In other examples, these messages may besent to a broadcast address that may be specific to a particularcollection or a particular service. As shown in the earlier portion ofthe flow diagram 1400, 14, Node B 1410 is not yet online, so Node B 1410does not respond to message 1455. At this time, only Node A 1405 isparticipating.

Once Node B 1410 comes online, Node B 1410 also sends a connect message1440 to the authority peer service 1415. The authority peer service 1415may respond with a list of nodes in the entire set with message 1445. Insome examples, the authority peer service 1415 may calculate a set ofpreferred nodes for participant node B 1410 to connect with. Thispreferred set of nodes may be sent to Node B 1410 with message 1450 andmay differ from the preferred set of nodes sent to Node A. In someexamples, messages 1445 and 1450 may be combined. Once Node A 1410receives the lists of node sets, the Node B 1405 may attempt to discoverwhich nodes are online and participating. The Node B 1405 may do this bysending a peer broadcast message 1455 to the other nodes. These messagesmay be sent to the IP addresses indicated for the nodes in the messages1445 and 1450. In other examples, these messages may be sent to abroadcast address that may be specific to a particular collection or aparticular service.

As shown in FIG. 14, Node A 1405 is online and participating. In theexample illustrated in FIG. 14, both nodes A and B are listed in the setof all nodes in the collection and both are on each other's preferredlist. Node A 1405 then determines whether to connect with Node B 1410,because Node B 1410 is now also a participant node after coming online.As already noted, device characteristics of Node B 1410 may determinewhether or not Node A 1405 makes this connection. In an example, devicecharacteristics may be exchanged between Node A 1405 and Node B 1410 atthis time. For example, Node B 1410 may send these devicecharacteristics in the Connect Request 1460. In another example, Node B1410 and Node A 1405 may utilize a shared scoring algorithm that mayfactor in one or more device characteristics. Rather than reporting on amultitude of statuses, each node may score itself according to thescoring algorithm and communicate only the score to each other. If NodeA 1405 determines to connect with Node B 1410, then Node A 1405 willsend a connect message 1460.

In some examples, each participating node may connect only to particularother participating nodes in order to impart structure on thenode-to-node connections. In an example, this structure may beconfigured as a ring structure network topology. (A ring structure isone example of a network topology configuration specifying how a subsetof nodes will connect to only a subset of other nodes, but it will beunderstood that other types of network topology and configurations maybe used, including bus, star, rig, circular, tree, and lineconfigurations). For example, each particular participating node maymaintain two connections to other online, participating nodes. Each nodemay have a unique node identifier. The first connection may be made bysearching for a participating node that has a unique id that is thesmallest unique id for participating nodes that is still greater thanthe particular node's unique id. This may be termed the “high”connection. The second connection may be made by searching for theonline node that has a unique id that is the greatest unique id that isstill smaller than the particular node's unique id. This may be termedthe “low” connection. Unique id's may be assigned by an authority whenthe node first connects to the collection.

In order to ensure that a participating node has at least twoconnections, if no participating nodes have unique id's that are higherthan the unique id of the particular node, then the particular node mayconnect high to the lowest participating node. Similarly, if noparticipating nodes have unique id's that are lower than the unique idof the particular node, then the particular node may connect low to thehighest online node in the list. In all cases, new participating nodescoming online may cause changes in the connections made.

As an example, suppose the list of preferred participating nodescontained four nodes—A, B, C, and D, with IDs 1-4 respectively. If NodeA and Node B are the only nodes currently participating, Node B mayconnect low to Node A and Node A may connect high with Node B. Thisresult is shown in FIG. 15A. Now if Node C comes online, Node C willattempt to connect low with Node B since Node B has the highest uniqueid (2) that is still lower than Node C's unique id (3). Node B willconnect high with Node C because Node C's unique id (3) is the lowestunique id that is still higher than Node B's (2). Because there are nonodes online with a higher unique ID than Node C (Node D is offline),Node C's search for a high connection with use of the ring structurewill reach the end of the node list, at which point Node C willwrap-around the node list and connect high with Node A. Node A willconnect low with Node C because there is no node that has a lower uniqueid than Node A, so Node A will wrap around to the highest unique id inthe list—which is Node C. This result is shown in FIG. 15B.

If Node D comes online, then Node D will connect low with Node C. Node Dwill also connect high with Node A. Node C will then disconnect withNode A, as Node A is now connected to both Node B and Node D, and Node Cis connected to Node B and Node D. This result is shown in FIG. 15C. Thedisconnection is shown as a dotted arrow. In FIG. 15C each node has twoconnections and the network forms a ring. Any new nodes will insertthemselves in the ring in position based upon their unique ids and thesurrounding nodes may reconfigure their connections to connect with thenew node. Similarly, when a node goes offline and ceases to participatein the data distribution system, the remaining nodes may reconfigure theconnection. (It will be understood that variations to the techniquesdescribed in connection FIGS. 15B and 15C may be implemented based uponuser, administrator, and network settings, and that other operations foridentifying high connections will be performed for other, non-ring-basednetwork topologies).

FIG. 16 illustrates a flowchart 1600 of an example method performed by anode to connect to another participating node according to some examplesof the present disclosure. As shown, the participating node may goonline and send a connection message to a peer service such as anauthority peer service (operation 1605). As already noted, the peerservice may operate on an authority, although in other implementationsthe peer service may operate with use of a store point node, anotherparticipating node, or the like. The participating node may receiveinformation on the all the nodes in the collection (operation 1610). Insome examples, the entire node set may be exchanged only upon firstjoining a collection. In these examples, the message received atoperation 1610 may include any changes to the list since theparticipating node last connected with the authority peer service. Theauthority peer service may calculate a preferred set of other nodes forthe node to connect to (operation 1615). The node may broadcast itspresence to other nodes in either or both of the entire node set and thepreferred node set and discover the nodes in either or both sets(operation 1620). Based upon the participant nodes in the entire nodeset or the preferred node set, the node may determine the connectionsthat it will make (operation 1625). The node may then establish thoseconnections (operation 1627). If, at any time, the participating nodediscovers new connections, such as at operation 1630, the participatingnode may repeat operations 1625 and 1627. In an example, the particularnode may first try connecting to participant nodes in the preferred setbefore attempting to make connections with participating nodes not inthe preferred set that are in the set of all nodes in the collection. Ifa new connection is received (operation 1630) the node's connections maybe re-evaluated (operation 1625) and any connection changes may be made(1627).

FIG. 17 shows a schematic block diagram 1700 of a participating node1705 and an authority peer service 1730 establishing peer connectivityaccording to some examples of the present disclosure. Input and Outputmodule 1710 at the participating node 1705 may establish communicationswith other participant nodes, the authority peer service 1730, storepoint nodes (and, as applicable, nodes running other a peer services).Input and Output module 1710 may send connection broadcasts, receiveconnection broadcasts, establish connections, receive node lists, andthe like. Connection module 1715 may receive the entire node set and thepreferred node set from the authority via the input and output module1710. Connection module 1715 may also receive information on which nodesare currently participating from the input and output module 1710.Connection module 1715 may utilize this information to determine whichnodes to connect with. Input and output module 1710 may then make theconnections determined by the connection module 1715. Data store 1725may store the entire node list, the preferred node list and connectioninformation. Other modules 1720 may perform one or more of the otherfunctions disclosed herein.

Input and Output Module 1735 at the authority may receive connectionmessages from participating nodes and send preferred and entire node setlists in response. Selection module 1740 may determine a set ofpreferred nodes according to the algorithms disclosed herein. Othermodules 1720 may perform one or more of the other functions disclosedherein. Data store 1750 may store the participating nodes, one or morepreferred node sets for one or more nodes, and device characteristicsfor nodes. The device characteristics may be utilized by the selectionmodule 1740 to determine the preferred node sets.

Client-Side File Replication and Transfer Deduplication

In connection with the data distribution architecture previouslydescribed, various actions may occur among the nodes of system 100 tosynchronize and transfer file system elements (e.g., data forimplementing folders and files). The following describes varioustechniques for implementing client-side “deduplication” techniques forthe data transfers occurring among the various file system nodes.

As referred to herein, these deduplication techniques involve thegeneral concept of fingerprinting file system elements that are beingshared and transferred, and dividing each file into separate unitsreferred to as “blocks” or “chunks.” These blocks may be shared,transmitted, received, and in some cases stored as separate, uniqueunits. The deduplication techniques when applied to data transfers asdescribed herein provide a mechanism to prevent duplication ofunnecessary data transfers, and to reduce the amount of bandwidth,processing power, and memory that is used to synchronize and transferdata.

Existing deduplication techniques in storage file systems (such asarchival file management systems) apply fingerprinting and file systemchunking/blocking for purposes of storing fewer copies of duplicate datablocks. Thus, the term deduplication is typically applied to mechanismsthat prevent the duplicate storage of data. As described herein, suchdeduplication concepts may also be applied for purposes of filereplication and file system data transfers across networks and amongnodes (e.g., peers) of a data distribution system. Because individualdata files are divisible into blocks, the events used to propagate thechanged data files can relate to the access and transfer of individualfile blocks within the changed data files.

As further detailed herein, the use of deduplication techniques toidentify blocks for data transfers can enable the usage of data blocksfrom any of a plurality of collections. This is because thededuplication operations that occur at a local node do not require a keyfor accessing a particular collection, and a deduplicated index of knownblocks across multiple collections can be maintained for purposes ofmanaging the file data. As a result, file deduplicationchunking/blocking techniques may utilize indexed block data to assist aparticipating node in independently building a same picture of a filefrom data in multiple collections.

As previously described with reference to FIG. 5A and FIG. 5B, the datastore that is associated with a particular client node (e.g., data store520A, limited data store 520B) may be accompanied by a block index 525.This block index can be used to identify file blocks that will recreatefiles or portions of files. For example, if as part of a synchronizationof file system data for a particular collection, the node 505A or 505Bis instructed to add a new file to the collection A or collection B, thenode 505A or 505B can check whether one or more of the file blocksneeded to recreate the new file are already stored in the block index525. Blocks already stored in the block index 525 need not betransferred over the network, and can be used with other blocks in theblock index 525 to recreate the new file. In some examples, the blockindex 525 may include indexing data for multiple collections (e.g.,multiple shared collections, a personal collection, and a backupcollection). Thus, the block index 525 may serve to catalog a largenumber of blocks existing throughout the file system storage on theparticular node.

FIG. 18 illustrates a more detailed example of a block tracking andindexing mechanism used in a node of a distributed data storage systemaccording to one example. As shown, the portable block catalog 1810 isprovided for cataloging data within the storage system. The portableblock catalog 1810 provides a schema for determining, managing, andstoring block meta data in a uniform fashion throughout the distributeddata storage system, and can provide identification of node-specificuses of known blocks (either in a file system data store 1820 orversioned data store 1830). This portable block catalog 1810 may existon each participating node in the distributed data system 100, forexample.

As shown in the example of FIG. 18, the portable block catalog 1810includes a block index 1825 used to store entries of known file blocksfrom a plurality of files. The blocks are linked from the block index1825 by a block pointer 1840 to the particular storage location in thedata store. For example, a block pointer stored in the block index 1825may include a reference to a combination of: a file identifier (of thefile providing the block), an offset (within the file providing theblock), a length (of the block within the file), and a hash valueidentifier (e.g., an MD5 hash of the intended block for verificationpurposes). The block index 1825 further includes a block index cache1827 used to load and persist a subset of block identifier cache entrieswhich are likely to be used and accessed (e.g., indexing information forcontiguous blocks of a particular file).

The block index 1825 further includes a bloom filter 1829 used fordetermining whether a particular file system block is located in theblock index 1825. The bloom filter 1829 operates as a space-efficientprobabilistic data structure that can be used to quickly determinewhether a particular block is not present in the block index (and thusis not present in the data store or other source location). The bloomfilter 1829 may be placed and loaded into memory or other fast accessstorage, and used as a mechanism to avoid checking the entire contentsof the block index 1825, and to avoid the need to load the entirecontents of the block index 1825 into memory (or repeatedly access thedisk to check values in the block index 1825). The bloom filter 1829 maybe configured to take a portion of the hash value of a block as a key,even assuming that a collision among multiple hashes is possibility. (Abloom filter is useful for determining whether the particular block isnot present—because a no is always a no, and a yes is a “possibly”.Thus, a collision among multiple hashes does not directly cause adverseconsequences, because the “possibly” result will be checked whenretrieved.) The bloom filter 1829 may be used to provide an indicationof a match for the full contents of the block index 1825, or forportions of the block index 1825 (such as tracking whether a particularblock is stored in the block index cache 1827 or another intermediatecache).

In an example, a particular block may be embodied by storage on a filesystem data store 1820 (e.g., within a hierarchical file system of anendpoint node), or storage within a versioned data store 1830 (e.g.,within a block-based store of a store point node). Within the filesystem data store 1820, a particular file 1822 is constructed from oneor more blocks 1824. Within the versioned data store 1830, a particularfile version 1832 of a particular file is also made up of one or moreblocks 1834. In some examples, the versioned data store 1830 mayimplement file storage deduplication techniques to remove or reduce thenumber of duplicate blocks and duplicate files. For example, filestorage deduplication techniques may be extended across duplicate blocksof a file, duplicate blocks of a particular machine or file system,duplicate blocks of a particular user or user group, duplicate blocks ofa particular shared collection, duplicate blocks of enterprises, orglobally within the versioned data store.

As previously described with reference to FIG. 10A and FIG. 10B, thedata store that is associated with a particular store point node 1005A,1005B may include a versioned data store 1015. This versioned data store1015 may be embodied by a block-based file system collection, providingstorage for blocks in groups of blocks and meta data to track the blocksprovided by the various file versions. For example, the versioned datastore 1015 may provide storage of numerous blocks in a separate orconsolidated versioned data store for a plurality of collections, users,devices, or enterprises.

FIG. 19 provides an illustration of a relational data schema 1900 for aversioned data store 1910, indicating the relationship and data fieldsamong fields for versions of a plurality of blocks. The plurality ofblocks may be tracked with entries in the block index and meta data 1920as the blocks are stored within a block storage structure 1940. As shownin FIG. 19, the block storage structure 1940 at the versioned data store1910 is embodied by an instance of a block data file (BDF) 1946 whichincludes a plurality of data attributes. For example, the BDF may beembodied as a 4 GB silo of raw data (with numerous BDFs used to storedata for a particular collection). As also shown in FIG. 19, the blockindex and meta data 1920 at the versioned data store 1910 is representedby relationships among a file version index 1922, a file version file1924, a parent child index 1926, a file metadata file 1928, a group ofblocks index 1930, a block metadata file 1942, and a block checksumindex 1944. Each of the file structures to store the data schemainformation may be implemented as a separate database (e.g., a separateLevelDB database). The relationship between various data fields in theblock index and the meta data 1920 is further indicated in FIG. 19,although it will be understood that other data structures and schemas(including non-relational data structures) could be used to track andmaintain similar meta data fields.

The use of block indexing and block tracking for individual filesenables the block-level benefits of deduplication to be applied forsynchronized data transfers between a variety of file systems, includingin hierarchical file systems that store a plurality of duplicate filesin multiple locations. In addition, when requesting data for a filesynchronized via a data distribution mechanism, the data may befulfilled at least in part with the use of individual blocks storedlocally on a data store of a node. Because each node maintains a mappingof stored data files and data file blocks within its data store, anevent for the data distribution mechanism can be fulfilled (andverified) from the data and meta data already maintained for theidentical blocks on the node data store.

FIG. 20 provides an illustration of a file system event sequence 2000using block deduplication techniques for a file according to oneexample, for a scenario where all of the file blocks for a particularfile are available and stored locally. As shown, a first node, Node 12002, intends to fulfill a synchronized file system event for theparticular file, and Node 1 2002 requests file meta data (operation2010) from a second node, Node 2 2004. Node 2 2004 performs operationsto retrieve block meta data for the particular file from a local blockindex (operation 2012). Node 2 2004 then returns the file block metadata to Node 1 2002 for further processing (operation 2014), with thefile block meta data containing an identification (e.g., hash orfingerprint keys of the various blocks, such as produced by an MD5 hashalgorithm) of all blocks for the desired file. Node 1 2002 candetermine, from the received file meta data, that it already has amatching file with all matching blocks (operation 2016) (or,alternatively, that all matching blocks exist on the file system but aresourced from more than one file). If all blocks from the desired filematch entries in the block index of Node 1 2002 (e.g., from a duplicatefile on the local file system), then the file copy replication processwill operate to locally replicate the file. This replication involvesobtaining blocks from the matching file indicated by the block index(operation 2018) and retrieving and writing the blocks to the new file(operation 2020).

FIG. 21 provides an illustration of a file system event sequence 2100using block deduplication techniques for transfer of file blocksaccording to one example, for a scenario where some (but not all) of thefile blocks for a particular file are stored locally. As shown, a firstnode, Node 1 2102, intends to fulfill a synchronized file system eventfor the particular file, and Node 1 2102 requests file meta data(operation 2110) from a second node, Node 2 2104. Node 2 2104 performsoperations to retrieve block meta data for the particular file from alocal block index (operation 2112). Node 2 2104 then returns the fileblock meta data to Node 1 2102 for further processing (operation 2114),with the file block meta data containing an identification (e.g., hashor fingerprint keys of the various blocks, such as produced by an MD5hash algorithm) of all blocks for the desired file. At this point, Node1 2102 can determine from the file meta data that it already has one ormore matching blocks (operation 2116). The file copy replication processwill operate to locally replicate as much of the file as possible, bydetermining from the block index whether the matching blocks are storedlocally (operation 2122), and retrieving and writing the locally storedblocks to the new file (operation 2124). The file copy replicationprocess will also replicate the remaining blocks at Node 1 2102 byrequesting the remaining blocks from another node (operation 2126),retrieving the block data at Node 2 2104 from a data store (e.g., anarchive or file system data store) (operation 2128), and receiving theblock data for the particular file from Node 2 2104 (operation 2130).Node 1 2132 will then write the remaining block data to the file(operation 2132). The reconstructed file(s) may be subsequentlyvalidated with a hash algorithm (e.g., a SHA-256 checksum of the entirefile) for post construction accuracy. (It will be understood that thefile request and write operations may occur in a different sequence thanthat depicted in FIG. 21, as operations to write synchronized data mayoccur prior to retrieving local data.)

FIG. 22 depicts a flowchart 2200 of node operations (e.g., client orendpoint node operations) for performing a deduplicated file replicationaccording to one example. These operations may be performed in thesystem 100 implementing a data distribution mechanism, for example, at adestination node in response to a synchronized event stream indicatinginformation for a particular file.

As shown in flowchart 2200, the desired file for storage is identifiedin a local data store or data source of a node (operation 2202). Forexample, this desired file may be identified as part of an eventreceived from a distributed data synchronization collection, or may be aparticular version of the file selected by the user for restoration. Thenode then requests the meta data of the desired file from a remotesource, such as another node or a store point node (operation 2204).Meta data for the desired file is then received from the remote sourceand processed (operation 2206). For example, this meta data may indicatethe hash values or other unique (or nearly-unique) identifiers ofindividual blocks of the desired file. The meta data then can be used bya receiving node to select the blocks that the node needs in order toreconstruct the file.

The desired file for storage may be reconstructed from the use oflocally available blocks and remotely received blocks. Locally availableblocks may be identified on a local data store (operation 2208), such aswith use of a bloom filter, predictive data element, or other cachestored in memory that provides an indication of whether the blocks areindexed (e.g., exist in a block index) in a location of a local datastore. The locally unavailable blocks may be identified and requestedfrom a remote source (operation 2210). The combination of the locallyidentified and remotely identified and received blocks then can be usedto reconstruct the desired file (operation 2212).

FIG. 23 depicts a flowchart 2300 of node operations (e.g., server orstore point node operations) for providing data for a deduplicatedreplication of a particular file according to one example. Theseoperations may be performed in response to a request for a particularfile that is designated to be distributed to a destination node in thedistributed data synchronization architectures described herein. In someexamples, multiple nodes may provide the operation of the flowchart 2300(for example, with a first node providing meta data for a particularfile in operations 2302, 2304, 2306, and a second node providing blockdata for the particular file in operations 2308, 2310, 2312).

As shown in flowchart 2300, a request for meta data of the particularfile is received (operation 2302). This request, if received by a storepoint node, may indicate the particular version to be retrieved. Therequest is processed to obtain meta data for the blocks associated withthe particular file (operation 2304). The meta data for the blocksassociated with the particular file is then transmitted to therequesting node (operation 2306).

Further operations to provide one or more blocks of the particular nodeinclude the processing of the request for the blocks indicated by themeta data. This may include the receipt of the request for one or moreparticular blocks of the desired file (operation 2308), and retrievingthe block data from a data store based on the request (operation 2310).This processing may also involve the use of a bloom filter, otherpredictive data element, or other cache stored in memory to determinewhether all of the blocks are already stored or available at the datastore. The block data of the particular file is then transmitted to therequesting node (operation 2312).

The presently described techniques for block identification anddeduplicated file transfers may also be applied in connection with thetransfer of versioned files. For example, a newer version of a file thatexists on an endpoint node may be constructed from unchanged locallyavailable blocks and changed remotely available blocks, thus onlyrequesting a transfer of the changed blocks. Further, requests forversioned file blocks for a particular collection may be obtained from astore point node or other participating node in a system implementing adata distribution mechanism.

The division or “chunking” of the file into various blocks may be basedon any number of techniques or algorithms. In one example, a Rabinalgorithm is used to identify boundaries of potential blocks in apredictable yet variable sized manner. (The block size provided to theRabin algorithm may vary based on implementation or file sizes, but mayinclude an average of 32K for large files, for example). Once the blocksare divided, the blocks can then be individually fingerprinted with ahash algorithm (such as MD5), and this hash algorithm value serves asthe identifying key of the block that is exchanged in meta data for alisting of blocks of the desired file.

In further examples, the deduplication of blocks at storage nodes (suchas store point nodes) may be performed in connection with duplicationamong multiple collections of data (e.g., plans). These may be used tofacilitate efficient file transfer operations in a number of filesynchronization scenarios. For example:

In a scenario where a subject file is moved (or renamed) to result in achange of the membership of the subject file from a first collection toa second collection, a node may perform operations to access and copythe underlying blocks as indicated by the local block index. As will beevident, in a file system move or rename operation, the blocks of thesubject file do not change and are available on the local data store.Thus, operations may be performed so that a node retrieves no blocksfrom a remote node, but instead copies blocks available to the node inthe local data store to recreate the subject file.

In a scenario where a subject file is added to a collection, but whereexisting blocks of the subject file already exist locally and areindexed (e.g., the blocks are indexed as part of another localcollection), the node may request only missing blocks to fulfill thefile system events for the collection. In this manner, the node mayrequest as few as one new block for addition of the subject file to thecollection.

In a scenario where different collections use different encryption keys,the use of a multiple-collection block index at the node allowsretrieval and use of blocks among the different collections at the samenode. This is particularly useful when creating identical files whereall blocks for the file exist locally (but may be stored or associatedwithin another collection). Thus, the use of the block index enablesidentification and retrieval of blocks from across collections in theoriginal data store data, regardless of key constraints on thecollection.

In a scenario where data files from a node are provided to a versioneddata store, the use of deduplicated block transfers enables a reductionin bandwidth and processing to store information in an archived datacollection. In this fashion, the deduplicated file transfer will beenabled to deduplicate blocks according to version history (and notrequire transfer of blocks that are identical to other version(s) of theblocks already stored). In a similar fashion, if the destination hasaccess to the keys, the deduplicated file transfer will be enabled todeduplicate blocks across plans that share the same key (such aspersonal and archive collections). The storage of the block at theversion data store, with one instance within an archive, satisfies thedata needed for all plans within the archive.

FIG. 24 is a block diagram 2400 that illustrates a system including anode 2405 configured for performing a file replication using blockdeduplication techniques, according to an example. The modules describedin FIG. 24 may implement the deduplicated file transfer functions asdescribed herein and the modules may be in addition to, or instead ofthe modules described in other sections in this specification. As shown,the node 2405 includes a data store 2401 storing the data files (e.g.,in a hierarchical file system), and a meta data index 2402 includinginformation on the data files. The node 2405 includes a file chunkingmodule 2410 that is configured to divide the file into blocks (chunks),to produce blocks with use of a Rabin algorithm or other variable sizealgorithm. The node 2405 further includes a block indexing module 2420configured to perform block indexing to identify and track the variousblocks in the data store 2401, including maintaining information in themeta data index 2402. The node 2405 further includes a nodecommunication module 2430 configured to initiate (or fulfill) datarequests with other nodes in connection with deduplicated blocktransfers among the nodes, a file storage module 2440 configured toperform local storage of the various transferred blocks into files, anda file versioning module 2450 configured to manage versioning of thefiles (and to request or transfer versioned data blocks in connectionwith the retrieval or storage of blocks from particular file versions).

Anticipatory Storage

As already noted with respect to FIG. 5B, some nodes may be limited instorage space. Other nodes may be limited in network bandwidth,processing power, or other characteristics. As such, to save resources,a particular node may store the contents of only a subset of theelements of the collection on local storage of a node. As previouslynoted, the set of elements of the collection that is locally stored iscalled the locally available set of elements. In examples in which theavailable set of elements consists of a subset of elements in thecollection, the system must determine which elements of the collectionto keep locally available.

In some examples, the locally available set of elements may be the setof elements of the collection that the data distribution systemdetermines to have the highest probability of being accessed in thefuture by the user of the node. Elements in the collection that are notin the set of locally available elements may be referred to herein asnon-locally available elements. Unlike for members of the locallyavailable set, the actual contents of non-locally available elements maynot be stored in local storage of the node, however meta datacorresponding to the non-locally available elements (e.g., name, size,and the like) may be stored to allow users to select and retrieve thecontents should the user desire.

Both locally available and non-locally available elements may beaccessible to users through a user interface corresponding to the datadistribution system. When a user selects a locally available element theuser's interaction experience is identical to a locally stored filesystem element because it is locally stored. When a user selects anon-locally available element, the node first obtains it over a networkfrom another node or server in the data distribution system. Because thenon-locally available element is not locally stored, interacting with anon-locally available element requires a network connection.

This predictive storage system thus achieves a balance betweenconsumption of local resources and timely access of important elementsin the collection by keeping some items that are likely to be used inlocal storage for convenient access and allowing users to access theremaining items over a network, thus saving local storage space andmemory usage. This is in contrast to systems that store all contents ofall shared items at all times (which may utilize enormous storageresources) and systems that only provide meta data about elements andretrieve those elements on-demand from users (which require a networkconnection at the time the user desires the content).

In order to determine the locally available set, the system may employpredictive algorithms that may utilize one or more signals whichindicate a likelihood that the user will interact with a particularelement of the entire collection in the future. The signals may includethe usage history of elements in the collection, user interestinformation, user context information corresponding to the usagehistory, the current context of the user, and the like.

The elements in the locally available set may vary over time as usagehistory, user preferences, context, and device characteristics change.For example, elements that have fallen into disuse may be demoted fromthe locally available set into the non-locally available set, andelements used frequently in the recent past may be promoted from thenon-locally available set to the locally available set. An element thatis demoted from the locally available set may be removed from localstorage of the node. In addition, the free space of a device may varyover time. For example, as free space of the device decreases, thesystem may shrink the size of the locally available set to free upadditional space for other user application. Likewise, if the free spaceof the device increases, elements from the non-locally available set maybe promoted to the locally available set.

FIG. 25 is a set diagram 2500 illustrating the elements of thecollection compared to elements of the locally available set accordingto some examples of the present disclosure. FIG. 25 shows the entire setof elements of a collection represented by the entire area of the circle2505. Locally available set 2510 is a subset of the entire set ofelements in the collection. The non-locally available set of elements2515 is shown as the elements in the collection that are not in thelocally available set and is represented by the diagonal stripes.

FIG. 26 is a system diagram illustrating a data distribution systemutilizing predictive storage according to some examples of the presentdisclosure. The file system elements of collection A are files 1-6 shownat 2620. In some examples, certain types of nodes may store andsynchronize all elements of the collection. An example of such a nodemight be a desktop computer, such as Node A 2625. Node A, for example,includes data store 2630 with all six files stored. The locallyavailable set for Node A 2625 thus constitutes the entirety ofcollection A. The non-locally available set for Node A 2625 has nomembers. Node A 2625 receives both event data 2635 and file system data2640 for collection A. For example, an event may notify Node A 2625 thatfile 6 has been added to collection A. Node A 2625 may download file 6from cloud storage 2645. In other examples, the events and data shown inFIG. 26 that are being exchanged with the cloud storage 2645 may beexchanged with another peer node. In some examples Node A 2625 alsostores events corresponding to files 1 through 6 on data store 2630. Inthese cases, nodes that store all elements of a collection may not runthe prediction algorithms that predict the set of elements that a useris likely to need in the future.

Node B 2650 for example may be a mobile device with limitedfunctionality. Node B 2650 may have a limited data store, or was notconfigured by a user to store any of the six files of the collection.Instead, Node B 2650 monitors information about collection A, such asreceived events data 2665. For example, Node B may store in data store2655 the events corresponding to elements of the collection, but not theelements themselves. This is represented by the dashed linerepresentation of files one through six in data store 2655. In thisexample, the locally available set is empty and the non-locallyavailable set constitutes the entirety of collection A. If the user ofNode B 2650 wishes to interact with any one of files one through six,Node B 2650 may download the file from the cloud storage 2645 or fromanother node that has the desired file. For example, messages includingcollection A data 2660 may include portions of one or more of files onethrough six that were sent in response to a request for one or more ofthe files by user of Node B 2650.

Node C 2670 for example may be a mobile device with more advancedcapabilities than Node B 2650. Node C has a limited data store 2675 inthat is able to, or configured to, store only files 1-3 from thecollection in data store 2675. Thus the locally available setconstitutes files 1-3 and the non-locally available set constitutesfiles 4-6. Each node in the data-distribution system may have differentcharacteristics that determine how many elements from the collection arein the locally available set and which elements from the collection arein the non-locally available set. For example, data store 2675 on Node Cmay be larger than data store 2655 of Node B, enabling additional localstorage of elements in the collection (and thus an increased locallyavailable set). In other examples, user preferences for Node C mayallocate more space to collection A than the user preferences for Node B2650 allocates to collection A on Node B 2650. Node C 2670 receivesevents data 2665 for elements in the collection and data for files 1-3.If the user of Node C 2670 wishes to interact with files 4-6, the nodemay obtain one or more of those files if the node has networkconnectivity. In this instance, if there is not enough storage space onNode C to store a selected file from the non-locally available set, oneof the files currently stored in local storage (e.g., files 1-3) may beoverwritten with the selected file.

As already noted, in order to determine which elements of the collectionare in the locally available set and which elements are in thenon-locally available set, the system may utilize one or more predictivealgorithms. These algorithms determine which members of the collection auser has a high probability of interacting with during a futuretimeframe. The predictive algorithms may utilize one or more signals tomake this determination. Interacting with an element of the collectionmay include opening the element, editing the element, modifying theelement, deleting the element, moving the element, or the like. Thefuture timeframe for the prediction may be any desired future timeframe.Examples include the timeframe extending from the time the prediction iscalculated to a predetermined amount of time afterwards. Thepredetermined amount of time may include infinity (no time limit) orother timeframes. For example, the algorithms may determine whichmembers of the collection has the highest probability of a userinteraction in the next hour, the next two hours, the next day, the nextweek, the next month, and the like.

Signals (e.g., factors), as used in the context of predictive storage,are data that is collected or observed by the data distribution systemand that may be used to determine a locally available set. The termsignal is not used in the sense of a transmitted signal or carrier wave(although the signal may be communicated as such), but rather a piece ofinformation that may indicate that a user is more or less likely toaccess an element of the collection in a particular timeframe. Thesesignals may include the usage history of the elements in the collection,user interest information, information on user context associated withthe usage history, information on the current context of the user andthe like. The predictive algorithm may use multiple different types ofsignals and each different type may be weighted or used differently.These signals may relate to the user of the node, relate to all users ofthe collection, or both. In some examples, certain signals may relate toonly the user of the node and other signals may relate to all users ofthe collection. For example, if a particular element is openedfrequently by other users in the collection, it may be more likely thatthe particular user of the node will open that element.

Usage history signals may include past interactions with an element.This may be determined by analyzing events on the event stream. Asalready noted, the system may utilize one or both of the usage historyof the specific user of the node and the usage history of other users inthe collection. In some examples, prior usage history corresponding tothe user of the node may have greater predictive weight than the priorusage history of other users of the collection. Using history of theother users of a collection allows the system to move elements into thelocally available set that the user of the node has not interacted within the past, but is likely interested in based upon interest of otherusers of the collection.

User context information may be any information about the circumstancesof a user. Context information may include the date, time, location ofthe user device, a user's schedule, a user's biometric information, andthe like. User context information may be correlated to usage history toprovide the context around a user's (either the node user or any otheruser in the collection, or both) interaction with the elements in thecollection. In this manner, user context information may correspond topast usage history. For example, the date of a past interaction, thetime of a past interaction, the location of a past interaction, and thelike may allow the prediction algorithms to make accurate predictionsabout which elements a user is likely to interact with given the user ofthe node's current context. For example:

-   -   Time of day: at certain times of day a user may be more likely        to interact with certain elements of the collection. For        example, a user may be more likely to interact with a bedtime        story at night. If the system detects a pattern of interaction        with particular elements of the collection at certain times of        day, it may move those elements to the locally available set        just before the particular time of day.    -   Date: certain elements may be more likely to be interacted with        on certain dates, or seasons. For example, a user may be more        likely to interact with a holiday recipe around the time of the        holiday. If the system detects a pattern of interaction with        particular elements of the collection at certain dates, it may        move those elements to the locally available set just before the        particular date.    -   The user's schedule: elements of a collection may relate to        certain events of the user. For example, documents associated        with a scheduled meeting are likely to be interacted with during        the meeting or just before. If the system detects that        particular elements of the collection correspond to the user's        schedule, it may move those elements to the locally available        set just before the particular date.    -   A physical location: certain documents may be more likely to be        interacted with when the user is in a certain physical location.        For example, a user may be more likely to interact with a        digital ticket to an event when the user is near the stadium. If        the system detects that particular elements of the collection        correspond to a particular event or location, it may move those        elements to the locally available set just before the particular        event or just before the user travels to that location.

User interest information may include signals which indicate topics orelements that a user is interested in. For example, the user mayexplicitly indicate certain files that are to be maintained in thelocally available set. In other examples, the system may learn topicsthe user is interested in and score elements of the collection that areassociated with that topic higher than other elements. The system mayanalyze the contents of past elements that a user has interacted with todetermine topics associated with those collection elements. Collectionelements that score highly for topics that were interacted with most inthe past may then be more likely to be moved to the locally availableset. These calculations may be time weighted such that collectionelements relating to topics recently accessed by the user would be ratedmore highly than collection elements relating to topics accessed by theuser further back in time. Determining topics of elements of thecollection may be done automatically using algorithms such as LatentDirichlet Algorithms (LDA), or manually through document tagging. Usinglatent topics to derive interests allows the system to move elementsinto the locally available set that the user of the node has notinteracted with in the past based upon a predicted interest.

Other sources of information may also or instead be used to deduce auser's interests. For example, skills information associated with asocial networking account may relate to topics. Job titles may relate totopics of interest (e.g., if a person is a computer programmer, elementswhich relate to computer programming may be scored higher). Documentsand topics discussed on social media websites and blogs frequented bythe user may be scored higher. A user may provide access to an email,social networking, or other user profile account which may be analyzedfor topics of interest. Text may be analyzed for topics using algorithmssuch as LDA or manually through tagging.

The signals used in the predictive algorithms may be determined from avariety of sources. For example, such sources may involve internalsensors on the node or external sensors communicatively linked to thenode. Example sensors may include global positioning system (GPS)sensors, accelerometers, biometric sensors, cameras, microphones, andthe like. Other signals may be determined based upon communications withother applications on the node, or other applications on other nodes orother computing devices. For example, a user's schedule may be obtainedthrough communication with a calendar application (either on the node oranother computing device). A user's social networking data may beaccessed by accessing a social networking service through an applicationprogramming interface, and the like. Usage signals may be determinedbased upon event data such as event data 2635, 2665, and 2680.

Turning now to FIG. 27, a flowchart 2700 of a method illustrating a datadistribution system utilizing predictive storage according to someexamples of the present disclosure is shown. As illustrated, apredictive algorithm may score elements in the collection based upon anassessment of how probable the user is to interact with that element inthe future (operation 2705).

In an example, the score for each element may utilize one or more of thefollowing formulae which utilize one or more signals:

SubscoreA=100−((the number of days since the element was last opened bythe user)*0.5)

SubscoreB=100−(the number of days since the element was last explicitlyretrieved by the user)

SubscoreC=75−(the number of days since the element was last modified bythe user)

SubscoreD=50−(the number of days since the element was last modified byother users of the plan).

In one example, one or more of the above formulae are utilized incombination. For example, the subscores may be added together to form atotal score:

Total score for an element of the collection=weight A*|SubscoreA|+weight B*|Subscore B|+weightC*|Subscore C|+weightD*|Subscore D|.

weightA, weightB, weightC, and weightD may have a value of one to weightall the subscores the same, or may be weighted differently. One or moreof weightA, weightB, weightC, and weightD may have a value of zero tonullify any contribution from that signal. The weights may changedynamically over time in response to additional usage historyinformation. For example, if the user is consistently selecting elementsof the collection that are in the non-locally available set as opposedto the locally available set, the system may compare the scores for eachsubscore between the locally available set and the recently selecteditems on the non-locally available set and adjust the subscores suchthat the recently selected items from the non-locally available setwould have been selected. The constant values (e.g., 100, 75, 50, 0.5)are simply examples to aid the reader's understanding and one ofordinary skill in the art with the benefit of this disclosure willunderstand that other values may be possible and are within the scope ofthe disclosure. In some examples, the constants may change depending onthe timeframe that the prediction is based upon. For example, if theprediction is made such to predict in the near future, the constants maybe shrunk so that usage history in the distant past drops out of theprediction faster (e.g., it takes less time for a previous interactionto drop to zero in the calculations).

Other formulas and weightings are also possible and within the scope ofthe present disclosure. For example, machine learning algorithms may beused to score the elements in the collection by learning usage patternsand scoring elements based upon a calculated probability that the useris likely to interact with each element. For example, a predictive modelmay be built using past signals and may output scores for elements ofthe collection based upon new signals. The scoring may also be modifiedor manipulated by an end user or network administrator according to userpreferences, network policies, and the like.

Signals used to construct the model may be specific to a collection, ormay be from all collections the user is a member of, or may be from alluser data system wide across all collections. In some examples the modelmay be built by an individual node for use on that node. In otherexamples, nodes such as the authority, a cloud node, or a node withsufficient processing power may build the model. The model may then bedistributed to one or more nodes participating in the collection. Inother examples, the node or server which builds the model may keep themodel and make the predictions for each node and inform each node onwhich files to download based upon event streams and other signals thatmay be communicated to the node that builds the model.

Once the model is constructed, various signals may be collected by thenodes and used to make predictions using the model by scoring theelements of the collection. Example machine learning algorithms that maybe used may include neural networks, decision tree algorithms, linearregression, logistic regression, support vector machine, Bayesiannetworks and naïve Bayesian algorithms, K-nearest neighbors, and thelike. In some machine-learning algorithms, a score may be a binary—yesor no—that is, the binary result answers the question: given the signaldata, is this file likely to be accessed by the user in the future? Inthese examples, the score may be binary.

In other examples, a plurality of different algorithms may be combinedto produce the score. Each algorithm's individual score for an elementmay be normalized, weighted, and combined with scores from the otheralgorithms. For example, each item score may be generated by calculatinga first score from a neural network and a second score from a decisiontree for each item and the calculating a final score for the item bysumming the first and second scores. As noted, the first score may beweighted with a first value and the second score may be weighted with asecond value.

Once the scores for the elements in the collection are determined(operation 2705) the set of locally available elements may be chosenbased upon the scores assigned to elements of the collection (operation2710). Which elements and how many elements to choose may involve aselection algorithm. The selection algorithm may include as input thenode's device characteristics, user preferences, and the scores for theelements in the collection.

Device characteristics may include a size for the locally available set.For example, the system may take the top N scoring elements (where N isa predetermined number that is greater than or equal to zero). In otherexamples, the system may take the top N % of elements (where N is apredetermined percentage that is greater than or equal to zero). In yetother examples, the system may have a size quota that specifies themaximum size that the locally available collection can take up on localstorage. The size for the locally available set may be set by userpreferences, the free space available on local storage of the node, or acombination of both. For example, a user may specify that the locallyavailable set may take up a percentage of available local storage.

The selection algorithm may start at the highest scoring elements andwork down to the lowest scoring elements until the quota is filled. Insome examples, if the quota has not yet been met, but the next highestscoring element will exceed the quota if stored in local storage, thesystem may continue checking elements in descending order of score forthe highest scoring element that fits under the quota. For example, ifthe quota is 500 MB, the locally available set is currently at 498 MB,and the remaining items in the non-locally available list are asfollows:

Directory “x” with a size of 6 megabytes (MB) and a score of 74;

File “y” with a size of 1.5 MB and a score of 67;

File “z” with a size of 1 MB and a score of 65.

In this example, the system will bypass directory “x” as it exceeds thestorage quota and will instead select file “y” for local storage as itis the next highest scoring element.

In still other examples, a maximization function may seek to maximizethe combined scores of the elements in the locally available set giventhe constraint of the quota size. Thus, the elements of the collectionmay be:

Directory “x” with a size 6 MB and a score of 74;

File “p” with a size of 5 MB and a score of 73;

File “y” with a size of 1.5 MB and a score of 67;

File “z” with a size of 1 MB and a score of 65;

If the quota size is 7 MB, rather than selecting directory “x” and file“z” for a combined score of 139, the system may select file “p” and file“y” which lead to combined score of 140.

Once the set of locally available elements is determined, the system maydetermine whether any elements of the locally available are already inlocal storage (operation 2715). For example, if the locally availableset is empty (e.g., the node just joined the collection), or if anelement in the collection was newly promoted to the locally availableset, one or more elements may need to be retrieved. If no elements needto be retrieved, the present flow ends (at operation 2730). If newelements need to be obtained the node may request the elements notalready in the node's local data store (operation 2720). The newelements may be stored (operation 2725). If elements have been demotedfrom the locally available set to the non-locally available set, thedemoted elements may be removed from the local data store and replacedby the newly received elements.

The method described in FIG. 27 may be executed or deployed when thedevice first adds or joins the shared storage plan. In addition, themethod may be executed or deployed at regular time intervals or inresponse to predetermined conditions. For example, if the algorithmsthat are used calculate scores based upon a granularity of days (e.g.,the number of days since the last access, since creation, etc. . . . ),then the scores may be calculated and evictions may happen daily. If thealgorithms use a finer granularity (e.g., hours), then the periodicitybetween score calculations and evictions may be more frequent (e.g., theperiodicity may match the granularity). In some examples, the methodsillustrated in FIG. 27 may be executed or deployed in response toreceiving an event indicating a change in an element of the collection.

The method of FIG. 27 may be executed on the nodes themselves, andindeed, the algorithms used may be customized based upon each node. Forexample, on a mobile device such as a smartphone, the limited screendisplay and the more difficult text entry may make editing moredifficult. Therefore, users of these devices may use the device more forviewing content and less for editing content. In these examples, contentthat is viewed more often may be a more powerful signal that the user isserver nodes, or on one or more peer clients.

In some examples, one or more operations of FIG. 27 may be performed byone node for a different node. For example, a mobile device may havelimited processing power and limited battery available for the necessarycomputations to calculate the set of nodes that should be locallyavailable. Thus a more powerful node with better power supply (e.g., astore point node, a peer node, or the like) may calculate the set oflocally available elements or any other operation of FIG. 27. Examplenodes include peer nodes, server nodes, and the like.

FIG. 28 shows an example logical diagram 2800 of a node 2805 accordingto some examples of the present disclosure. The modules described inFIG. 28 may implement the predictive storage functions as describedherein and the modules may be in addition to, or instead of the modulesdescribed in other sections in this specification. Input and outputmodule 2810 may communicate with other nodes, including peer nodes, theauthority, and server nodes. Input and output module 2810 may requestand receive items in the collection as well as events corresponding toitems in the collection. In some examples, input and output module 2810may receive one or more predictive models, user preferences, signals, orother information. Input and output module 2810 may communicate withother applications on the node, and may communicate with other computingdevices. For example input and output module 2810 may request andreceive signal information from these external sources.

Prediction module 2815 may utilize signals and event informationreceived from input and output module 2810 to score one or more items inthe collection according to the configured prediction algorithm. Forexample, prediction module 2815 may utilize past usage history of itemsin the collection to score the items. In other examples, predictionmodule 2815 may build a predictive model based upon observed usagehistory and signals. Once the model is built, the prediction module maycalculate scores for the collection, update scores based upon newsignals, and the like.

Control module 2820 may utilize the scores for the elements generated bythe prediction module 2815 to run a selection algorithm to select theset of locally available elements. In some examples, the set of locallyavailable elements may be limited to a particular number or size. Thesize or number of elements may be set by user preferences on the node orby node device characteristics. The user preferences may be setupthrough a user interface provided by input and output module 2810. Inother examples the size or number of elements may be set based upon analgorithm that may consider the individual node's characteristics. Forexample, the algorithm may determine the size of the set of locallyavailable elements based upon the available data store 2825. Forexample, the algorithm may choose a size that utilizes up to X % ofavailable local storage (where X may be a predetermined or userconfigurable number). In some examples, the size of the set of locallyavailable elements may fluctuate as the available local storagefluctuates.

To calculate the set of locally available elements, in some examples,the scores of each element may be utilized. As used herein, elementswith high scores are considered elements that are determined by thesystem to have a high probability of the user of the node interactingwith in the future. Whether the scores are organized such that lowernumerical scores represent the highest probabilities or higher numericalscores represent the highest probabilities, it is understood thatherein, highest scores indicates the highest probabilities.

In some examples, the node may fill the set of locally availableelements with the highest scoring elements and work down until themaximum number of items is filled or until the space allocated for thelocally available set is filled. In other examples, the node may utilizea maximum utility function to maximize the total scores of all theelements in the locally available set.

Control module 2820 may also request, through input and output module2810, elements to be downloaded to the node. For example, elements thatare requested added to the locally available set may include elementsthat are not currently stored in the node's local storage. Controlmodule 2820 may also overwrite any element stored in data store 2825with a newly downloaded element if necessary. Control module 2820 mayalso be responsible for triggering a re-scoring of one or more elementsin the collection responsive to receiving one or more events via theinput and output module 2810. Control module 2820 may also re-score theelements in the collection periodically. The period of this re-score maybe predetermined by the system, set by a user, or may be dynamicallyalterable. For example, the control module 3020 may re-score theelements in response to a determination that, over a predefined periodof time, the user has attempted to interact with a predetermined numberof elements in the non-locally available set of items. This may indicatethat the set of locally available items is not optimal and may need tobe recalculated. Feedback on the number of items selected from thenon-locally available-set and the locally available set may also be usedby the prediction module 2815 to refine the weightings used in thescorings, or used to refine the model (e.g., adjust neural networkweightings, and the like).

Data store 2825 may comprise any local storage on the node. Examplesinclude solid state memory, flash memory, magnetic media, hard drives,volatile memory, such as Random Access Memory, and the like. Sensormodules 2835 may provide signals to the prediction module 2815 and insome examples control module 2820. Sensors may include globalpositioning system (GPS) sensors, accelerometers, g-force meters,compasses, biometric sensors, light sensors, microphones, image capturedevices, and the like.

Collection Events and User Interface Interactions

The aforementioned details can be combined into a collection-manageddata distribution mechanism. In such a mechanism, the collection is usedto drive data distribution of file system elements amongst participantnodes. The following examples typically are described from theperspective of a single node and its interactions with other entities(e.g., store point, authority, or other endpoint nodes).

FIG. 29 illustrates a diagram showing event coordination and processingin a distributed data system 2900, according to an example. Thedistributed data system 2900 can include a number of sensors todetermine local file system element changes that can result in filesystem element events. These sensors can include OS events 2905 (e.g.,direct communications from the file system), a scanner 2910 (e.g., toscan file system elements for locations or changes), and eventsynchronization 2915 (e.g., modifications from other nodes). Remoteevents occurring after initial synchronization (i.e., ad-hoc events),can be received via the remote event receiver 2930. Events can be pushedthrough an event coordinator 2925, which can store them into the planevents database 2920, modify a file system element version database2935, and post them to a publish/subscribe facility 2940 for furtherprocessing. The publish/subscribe facility 2940 can provide events tothe user interface 2945, an event publisher 2955 for remote consumptionby other nodes (e.g., Node A 2960), and an event fulfiller 2950 toaddress events that require action on the distributed data system 2900.

The event fulfiller 2950 can store the locally processed events in aqueue 2965 that can feed the transport engine 2970. The transport engine2970 can manage reception and transmission of file system elementcontents, such as block data. The transport engine 2970 can include aretrieve CS 2975 and a send CS 2980 to respectively handle reception andtransmission of the contents data. Whereas the queue 2965 feeds thetransport engine 2970 with local events, remote requests for contentsdata can be fed through a remote transport queue 2990. Such remoteevents can originate from other nodes, such as Node B 2995. Finally, theretrieve file callback 2985 can receive verifications or failures ofcontents data operations from the transport engine 2970 to update theplan events database 2920, the versions database 2935, or other entitiesinterested in consuming this information.

The framework described above can interact with users in a variety ofways. The interactions can include creating a collection, adding filesystem elements to a collection, removing file system elements from acollection, and leaving a collection entirely (other issues, such aschanges to files, and the like are handled by observing file systemevents or changes to the files directly and do not use user inputspecifically directed to that end). In an example, a user access pointcan be monitored to receive user input and a UI can be provided inresponse to that user input. In an example, the user access point can bea click (e.g., a right click) on a file system element in a file browserto bring up a contextual menu including a collections menu. The UI isthe result of the collections menu selection. In an example, the accesspoint can be a URI or local collections fat-client.

The UI can include elements corresponding to any one or more actions tocreate a collection, leave a collection, share a file system element, orchange collection affiliation (including removal from a collection) of afile system element. Upon selection of one of these elements, contextualdata can be collected and distributed (e.g., as discussed above withrespect to FIG. 29) to collection participant nodes. The followingexample includes creation of a collection, however, similar techniqueswill be used for other operations.

After the UI is provided, an indication to create a collection can bereceived via the UI. A collection type can be identified based on theindication and a context of the UI. For example, the UI can include achoice to create a backup, personal, or multi-user collection when afile system element is right-clicked in a file browser. The indicationcan include the user's selection of one of these options. In an example,the context of the indication (e.g., files with a copy restriction,etc.) can be used to further limit available collection types. Forexample, a personal or backup collection can be available when such acopy restriction is in place but a multi-user collection is notavailable. In an example, the size of the file system element can be acontext derived attribute that can limit collection types (e.g., abackup collection can take a large file while the personal or multi-usercollections cannot). In this example, a set of predetermined sizes cancorrespond to available collection types.

After the collection type is identified, a collection schema can beobtained. In an example, each collection type corresponds to a differentcollection schema. In an example, a plurality of collection typescorrespond to the same collection schema. As described above, thecollection schema can include data definitions used to manage datadistribution for file system elements in the collection. Accordingly,the receipt of an identification of the set of file system elements thatare part of collection allows collection-managed data distribution. Inan example, this identification can be derived from the context of theUI (e.g., which file system element was right-clicked, etc.). Furtherdata definition fields of the collection schema can be populated inaccordance with the user indication, the UI, and the file system elementidentifications. Thus, required data definitions can be fulfilled viacontext, the user interface (e.g., prompting the user for the data, suchas the collection name), or the identified set of file system elementsthemselves. Finally, a portion of the collection definitions of thecollection schema can be communicated to a plurality of nodesparticipating in the collection. A variety of details of nodeparticipation is described above, but generally entails the inclusion ofdevices that have authenticated to an authority for the collection, aswell as the node that created the collection. The portion of thecollection definitions can be limited to endpoint node information.Thus, for example, the local root data for a first node is notcommunicated to a second node.

The example above described the creation of a collection. Leaving acollection or removing a file system element from a collection isgenerally a simpler process because the collection-specific informationis already known. In these examples, the mechanisms to identifycollection schemas and populate the data definitions can be replacedwith transmitting an indication of the change or deletion.

After a collection is created and communicated to a node, the node canuse the collection to manage data distribution. This can includereceiving the portion of the collection schema and synchronizing a localevent stream with participant nodes as identified in the portion of thecollection schema data definitions. As described above, the specificnodes to which the event stream will be synchronized are the nodes thatthe present node(s) decides to connect to, and may not include everyparticipant node in the collection.

A state of a local file system element identified in the set of filesystem elements can be identified. This state can correspond to aplurality of pre-defined states, such as changed, deleted, versionnumber, etc. A communication can be issued to the list of participantnodes in response to this state identification. In an example, the localstate is determined after a remote event is received from another node.In this example, the communication is one to retrieve the file systemelement contents needed by the node in order to satisfy the event. Afterthe communication is issued, a response from another participant node(not necessarily the one who initiated the event) can be received andused to complete the data distribution event (e.g., to make the localfile system element current to the most recent version of the filesystem element at the event initiating node).

In an example, the local state is determined as a matter of course, suchas via a local file system event or monitoring by the node. In thisexample, the most recent version of the file system element is local andhas not yet been distributed to other participant nodes. Thecommunication can be the event indicating the current state of the filesystem element, such as its version and contents. The response from theparticipant node, to the communication, can be a request for thecontents needed by the participant node. The node can complete the datadistribution event by transferring the requested contents.

FIG. 30A is an example user interface that illustrates collectionmanagement operations in a file system browser 3000, according to anexample. As described above, a file system browser 3000 can provide auser action point to initiate collection actions. This has the advantageof conveniently locating the collection operations in the user interfacethat users generally employ to manage file system elements. Thecontextual menu 3005 includes a collection option, which is the useraction point. User indications at the user action point (e.g., clickingon, moving the pointer to that menu option, etc.) can activate the UI3010 that provides the user the option to create the collection, amongother available actions. In an example, an additional UI can betriggered by the elements of the UI 3010 to, for example, acceptadditional user input specific to each of the illustrated options.

FIG. 30B is an example user interface that illustrates collectionmanagement operations in a mobile graphical user interface (GUI) 3050,according to an example. Many mobile platforms (e.g., phones, tablets,etc.) are designed for application rather than file centric manipulationby users and also use small screens (often effectively limiting UIchoices that are available on desktops). The mobile GUI 3050 illustratesa compact organization to address these common issues with mobileplatforms. In this example, opening the mobile application (e.g.,resulting in the mobile GUI 3050) is the user access point. The mobileGUI 3050 can include a new collection element 3055, as well as a list ofcurrent collections 3060 (in this example illustrating “P” correspondsto a persona collection, “S” to a multi-user collection, and “B” to abackup collection, although in practice different visual or otherindications can be used, or no distinguishing indication can be used),and a collection file system element viewing area 3065. Similar options,however, to those discussed above with respect to FIG. 30A can bepresent and operate in a similar manner.

FIG. 31A is a sequence diagram that illustrates operations 3100 tocreate a collection, according to an example. As noted previously, anAuthority and a store point node can participate in collection manageddata distribution. In this example, the store point serves as a remotedata backer for the node client and the Authority maintains its role inprovisioning collections. The node client makes a collection request tothe store point. The store point forwards that request on to theAuthority. In an example, the request is transmitted directly to theAuthority. In the example of collection creation, the Authorityidentifies the collection schema, populates the data definitions, andcommunicates the portion of the data definitions to the variousparticipant nodes. Thus, the Authority creates the collection.

After the collection is created, or other action is completed by theAuthority (e.g., leaving a collection by the node client, a user, etc.),a response is generated and transmitted back to the node client via thestore point, in an example. As the store point receives the collectionrequest response, it performs the requested action. Thus, for a creationof a collection, the store point creates the local data of thecollection from the response. Finally, the response arrives at the nodeclient, which can indicate to the user that the collection is created.In an example, where the Authority is unavailable, the collectionrequest (e.g., collection creation) would fail and the user can benotified of the failure.

FIG. 31B is a sequence diagram that illustrates operations tosynchronize a collection, according to an example. The operations ofFIG. 31B are similar to those of FIG. 31A, but include more detail on anexample starting with user input to create the collection, and followingwith collection creation and synchronization to a variety of nodes.Further, FIG. 31B illustrates the derivation of the local data path bythe node client. This is a context variable for the creation of thecollection that can be used to reduce user input.

FIG. 32 is a flowchart of a method 3200 illustrating a creation of acollection, according to an example. Operations of the method 3200 canbe performed by any appropriate computer system element described above.At operation 3205, a user action point can be observed. At operation3210, a UI can be provided to a user based on the user indicationreceived via the action point observation. At operation 3215, a plancreation (or other modification) indication can be received from theuser via the UI of operation 3210.

At operation 3220, a collection type can be identified based on any ofthe indication of operation 3215 or a context of the UI of operation3210. At operation 3225, a schema for the collection type identified atoperation 3220 can be obtained (e.g., from a local or remote data storeor service). At operation 3230, identification of file system elementsfor the collection can be received. This identification may be explicit,or can be either based on the context of the UI of operation 3210 (e.g.,which elements were selected when the creation indication was received)or be explicitly identified by the user (e.g., via a web browser afterthe creation indication is received).

At operation 3235, collection definitions of the collection schema canbe populated by any or all of the UI context (e.g., including user,location, or local file system element information), the identified filesystem elements themselves (e.g., contents of the file system elementssuch as shared user identification), or the explicit instructions of theuser via the UI indication. At operation 3240, a portion of thecollection schema data definitions can be communicated to participantnodes. In an example, the operations 3215 can be performed by anAuthority distinct from the client node.

FIG. 33 is a flowchart of a method 3300 illustrating a synchronizationof a collection, according to an example. At operation 3305, a portionof collection schema data definitions for a collection can be received(e.g., at a node from an Authority). At operation 3310, an event streamcan be synchronized with one or more participant nodes (e.g., asindicated by the portion of the data definitions from operation 3305).

At operation 3315, a file system element state can be identified from alocal data store. In an example, the file system element state can beone of the following: changed (e.g., contents or meta data), old (e.g.,not the latest version), deleted, added, or changed permissions. Thestate can correspond to data distribution events. For example, if thefile system element is changed, the data distribution event can includenotifying other participant nodes. In an example, if the state is old,the data distribution event can include retrieving the new version (orparts needed thereof) to make the local file system element equivalentto the newest version amongst the participant nodes.

At operation 3320, a communication can be issued to a list of theparticipant nodes in furtherance of completing (e.g., to move towardscompleting) the data distribution action. At operation 3325, a responseto the communication of operation 3320 can be received. At operation3330, the data distribution event can be completed using informationfrom the response of operation 3325. Thus, the response may includecontent data to update a local version of the file system element, orthe response can include a request for the changed content data andcompleted by sending that data to the participant node.

FIG. 34 is a flowchart of a method 3400 illustrating a synchronizationof a collection, according to an example. Throughout this description,discussion of particular collection types (e.g., backup, personal,multi-user) is generally unnecessary because the collection mechanics,including inter-node communication generally do not vary between thevarious collection types. For example, the network topology describedwith respect to FIG. 6 does not change when versioned data stores (e.g.,in store point nodes) are added to allow versioned backups, as describedwith respect to FIGS. 11-13. In fact, the same communications, datadeduplication, and security mechanism can be used, without modification,between a backup and a non-backup collection. Thus, the method 3400illustrates just such a hybrid operation on a single communicationsplatform, from the perspective of a node.

At operation 3405 the node can participate in a backup collection for afile system element in a local data store. That is, at least one filesystem element in the local data store is managed by a backupcollection. At operation 3410 the node also participates in a personalor multi-user (e.g., non-backup) collection for the same file systemelement.

At operation 3415, the node can detect a change in the file systemelement in the local data store. At operation 3420 the change can becommunicated to a plurality of participant nodes, at least one of whichincludes a versioned data store. The versioned data store is also aparticipant in the backup collection.

At operation 3425 the node can receive a synchronization event from asecond one of the plurality of participant nodes (i.e., not the one withthe versioned data store). At operation 3430 the node can restore aprevious version of the file system element from the node with theversioned data store. For operation 3430, an additional UI element onthe node can be provided to allow for previous version browsing, etc.,allowing the user to select a previous version to restore.

An advantage of this hybrid system includes responsive backup withouthaving to batch operations or duplicate data that is already beingsynchronized. Moreover, the dual use of many elements in the datadistribution mechanism can increase efficiency (e.g., by removingredundant running programs or redundant transfer of data), and reducecost.

Computer System Implementation Examples

FIG. 35 illustrates a block diagram of an example machine 3500 uponwhich any one or more of the techniques (e.g., methodologies) discussedherein may perform. In some examples, the machine 3500 may operate as astandalone device or may be connected (e.g., networked) to othermachines. In a networked deployment, the machine 3500 may operate in thecapacity of a server machine, a client machine, or both in server-clientnetwork environments. In an example, the machine 3500 may act as a peermachine in peer-to-peer (P2P) (or other distributed) networkenvironment. The machine 3500 may be a personal computer (PC), a servercomputer, a tablet PC, a set-top box (STB), a personal digital assistant(PDA), a mobile telephone, a web appliance, a network router, switch orbridge, or any machine capable of executing instructions (sequential orotherwise) that specify actions to be taken by that machine. Further,while only a single machine is illustrated, the term “machine” shallalso be taken to include any collection of machines that individually orjointly execute a set (or multiple sets) of instructions to perform anyone or more of the methodologies discussed herein, such as cloudcomputing, software as a service (SaaS), other computer clusterconfigurations.

Examples, as described herein, may include, or may operate by, logic ora number of components, modules, or mechanisms including circuit sets.Circuit sets are a collection of circuits implemented in hardware (e.g.,simple circuits, gates, logic, etc.). Circuit set membership may changebased on time or underlying hardware availability. Circuit sets includemembers that may, alone or in combination, perform specified operationswhen operating. In an example, hardware of the circuit set may beimmutably designed to carry out a specific operation (e.g., hardwired).In an example, the hardware of the circuit set may include variablyconnected physical components (e.g., execution units, transistors,simple circuits, etc.) including a computer readable medium physicallymodified (e.g., magnetically, electrically, moveable placement ofinvariant massed particles, etc.) to encode instructions of the specificoperation. In connecting the physical components, the underlyingelectrical properties of a hardware constituent are changed, forexample, from an insulator to a conductor or vice versa. Theinstructions enable embedded hardware (e.g., the execution units or aloading mechanism) to create members of the circuit set in hardware viathe variable connections to carry out portions of the specific operationwhen in operation. Accordingly, the computer readable medium iscommunicatively coupled to the other components of the circuit setmember when the device is operating. In an example, any of the physicalcomponents may be used in more than one member of more than one circuitset. For example, under operation, execution units may be used in afirst circuit of a first circuit set at one point in time and reused bya second circuit in the first circuit set, or by a third circuit in asecond circuit set at a different time.

Machine (e.g., computer system) 3500 may include a hardware processor3502 (e.g., a central processing unit (CPU), a graphics processing unit(GPU), a hardware processor core, or any combination thereof), a mainmemory 3504 and a static memory 3506, some or all of which maycommunicate with each other via an interlink (e.g., bus) 3508. Themachine 3500 may further include a display unit 3510, an alphanumericinput device 3512 (e.g., a keyboard), and a user interface (UI)navigation device 3514 (e.g., a mouse). In an example, the display unit3510, input device 3512 and UI navigation device 3514 may be a touchscreen display. The machine 3500 may additionally include a storagedevice (e.g., drive unit) 3516, a signal generation device 3518 (e.g., aspeaker), a network interface device 3520, and one or more sensors 3521,such as a global positioning system (GPS) sensor, compass,accelerometer, or other sensor. The machine 3500 may include an outputcontroller 3528, such as a serial (e.g., universal serial bus (USB),parallel, or other wired or wireless (e.g., infrared (IR), near fieldcommunication (NFC), etc.) connection to communicate or control one ormore peripheral devices (e.g., a printer, card reader, etc.).

The storage device 3516 may include a machine readable medium 3522 onwhich is stored one or more sets of data structures or instructions 3524(e.g., software) embodying or utilized by any one or more of thetechniques or functions described herein. The instructions 3524 may alsoreside, completely or at least partially, within the main memory 3504,within static memory 3506, or within the hardware processor 3502 duringexecution thereof by the machine 3500. In an example, one or anycombination of the hardware processor 3502, the main memory 3504, thestatic memory 3506, or the storage device 3516 may constitute machinereadable media. While the machine readable medium 3522 is illustrated asa single medium, the term “machine readable medium” may include a singlemedium or multiple media (e.g., a centralized or distributed database,and/or associated caches and servers) configured to store the one ormore instructions 3524.

The term “machine readable medium” may include any medium that iscapable of storing, encoding, or carrying instructions for execution bythe machine 3500 and that cause the machine 3500 to perform any one ormore of the techniques of the present disclosure, or that is capable ofstoring, encoding or carrying data structures used by or associated withsuch instructions. Non-limiting machine readable medium examples mayinclude solid-state memories, and optical and magnetic media. In anexample, a physical machine readable medium comprises a machine readablemedium with a plurality of particles having invariant (e.g., rest) mass.Accordingly, machine-readable media are not transitory propagatingsignals. Specific examples of machine readable media may include:non-volatile memory, such as semiconductor memory devices (e.g.,Electrically Programmable Read-Only Memory (EPROM), ElectricallyErasable Programmable Read-Only Memory (EEPROM)) and flash memorydevices; magnetic disks, such as internal hard disks and removabledisks; magneto-optical disks; and CD-ROM and DVD-ROM disks.

The instructions 3524 may further be transmitted or received over acommunications network 3526 using a transmission medium via the networkinterface device 3520 utilizing any one of a number of transferprotocols (e.g., frame relay, internet protocol (IP), transmissioncontrol protocol (TCP), user datagram protocol (UDP), hypertext transferprotocol (HTTP), etc.). Example communication networks may include alocal area network (LAN), a wide area network (WAN), a packet datanetwork (e.g., the Internet), mobile telephone networks (e.g., cellularnetworks), Plain Old Telephone (POTS) networks, and wireless datanetworks (e.g., networks implemented according to the Institute ofElectrical and Electronics Engineers (IEEE) 802.11 family of standardsknown as Wi-Fi®, the IEEE 802.16 family of standards known as WiMax®,the 3GPP family of standards including Long Term Evolution(LTE)/LTE-Advanced, or the IEEE 802.15.4 family of standards),machine-to-machine/device-to-device/peer-to-peer (P2P) networks, amongothers. In an example, the network interface device 3520 may include oneor more physical jacks (e.g., Ethernet, coaxial, or phone jacks) or oneor more transceivers and antennas to connect to the communicationsnetwork 3526. In an example, the network interface device 3520 mayinclude a plurality of antennas to wirelessly communicate using at leastone of single-input multiple-output (SIMO), multiple-inputmultiple-output (MIMO), or multiple-input single-output (MISO)techniques. The term “transmission medium” shall be taken to include anyintangible medium that is capable of storing, encoding or carryinginstructions for execution by the machine 3500, and includes digital oranalog communications signals or other intangible medium to facilitatecommunication of such software.

SPECIFIC NOTES AND EXAMPLES

Implementation examples of the previously described subject mattercorrespond to apparatuses, hardware configurations, and related computerprograms that carry out the above-described methods. The followingexamples are provided as illustrative embodiments of the previouslydescribed subject matter, with reference to specific operations andstructures. It will be understood that significant variation andcombination of the following examples may exist to define the scope ofthe presently described embodiments and any claims encompassing thepresently described embodiments.

Collection Data Distribution Examples Example A1

Subject matter (such as a device, instructions, or a method) forcollection managed data distribution comprising: observing a user actionpoint, the user action point to receive user input; providing a userinterface to a user to create a collection in response to receipt of theuser input at the user action point; receiving an indication from theuser to create the collection via the user interface; identifying acollection type based on the indication and a context of the userinterface; obtaining a collection schema for the collection type;receiving identification of a set of file system elements for thecollection; populating a plurality of collection definitions of thecollection schema in accordance with the indication, the context, andfile system element identifications for the set of file system elements;communicating a portion of the collection schema to a plurality of nodesparticipating in the collection, the portion including a list ofparticipant nodes and the file system element identifications;receiving, at a data distribution mechanism of a node from the pluralityof nodes, the portion of the collection schema; synchronizing, by thenode, an event stream with a participant node in the list of participantnodes, the event stream including indications of changes of the filesystem elements between the participant nodes; identifying, from a localdata store of the node, a state of a file system element identified inthe file system element identifications, the state being one of aplurality of states, the state corresponding to a distribution action;issuing, by the node in response to identifying the state, acommunication to the list of participant nodes to complete thedistribution action; receiving a response to the communication from aparticipant node in the list of participant nodes; and completing thedata distribution action using content from the response.

Example A2

The subject matter of Example A1, wherein observing the user actionpoint and providing the user interface are performed by the node.

Example A3

The subject matter of any of Examples A1-A2, wherein the user actionpoint is a user interface element added to a file browser application.

Example A4

The subject matter of any of Examples A1-A3, wherein the user actionpoint is a user interface element provided by the data distributionmechanism.

Example A5

The subject matter of any of Examples A1-A4, wherein the user actionpoint is a web interface.

Example A6

The subject matter of any of Examples A1-A5, wherein the collection typeis one of a backup collection, a personal collection, or a multi-usercollection.

Example A7

The subject matter of any of Examples A1-A6, wherein retrieving acollection schema, receiving identification of the set of file systemelements, populating the plurality of collection definitions, andcommunicating the portion of the collection schema are performed by anauthority, the authority being a computer system that is distinct fromthe node.

Example A8

The subject matter of any of Examples A1-A7, wherein: the state of thefile system element indicates that the file system element is notcurrent in the local data store; the data distribution action includescausing the file system element to be current in the local data store;the communication is a request for contents of a current version of thefile system element; the response to the communication includes at leasta portion of the contents; and completing the data distribution actionincludes using the at least a portion of the contents to establish thecurrent version of the file system element in the local data store.

Example A9

The subject matter of any of Examples A1-A8, wherein the at least aportion of the contents are incremental differences between the currentversion of the file system element and a present version of the filesystem element.

Example A10

The subject matter of any of Examples A1-A9, wherein: the state of thefile system element indicates that the file system element has changedin the local data store; the data distribution action includes notifyingthe list of participant nodes about the file system element change; thecommunication is an event published to the list of participant nodes;the response is a request for contents of the file system element; andcompleting the data distribution action includes transmitting thecontents to the participant node.

Example A11

A computer system comprising modules configured to perform theoperations of any one or more of examples A1-A10.

Example A12

A non-transitory computer readable medium comprising instructions thatwhen executed by a processor of a computing system configures thecomputing system to perform operations of any one or more of examplesA1-A10.

Node-to-Node Data Distribution Examples Example B1

Subject matter (such as a device, instructions, or a method) to create adata distribution network comprising: at a first computing device usingone or more processors to perform operations of: sending a connectionmessage to an authority node over a computer network; receivinginformation on a first set of nodes from the authority node, the firstset of nodes being a subset of an entire set of nodes that areparticipating in a file system element collection of the datadistribution network; discovering a second set of nodes based uponnetwork messaging with respective nodes of the second set of nodes andbased upon the received information on the first set of nodes, thesecond set of nodes comprising members of the first set of nodes thatare communicatively reachable; connecting to a first node and a secondnode of the second set of nodes, the first and second nodes selectedfrom the second set of nodes according to a predetermined connectionalgorithm; synchronizing an event stream with at least one of the firstand second nodes; and responsive to synchronizing the event stream,downloading at least one file system element corresponding to a filesystem event of the synchronized event stream from at least one of thefirst and second nodes.

Example B2

The subject matter of Example B1, wherein the connection algorithmcomprises selecting the first and second nodes from the second set ofnodes based upon a unique node id of each node in the second set and aunique identification of the first computing device.

Example B3

The subject matter of any of Examples B1-B2, wherein selecting the firstand second nodes comprises: searching the second set of nodes for aparticular node with a unique identification that is the smallest uniqueidentification that is still greater than the unique identification ofthe first computing device; and responsive to finding the particularnode, selecting the particular node.

Example B4

The subject matter of any of Examples B1-B2, wherein selecting the firstand second nodes comprises: searching the second set of nodes for aparticular node with a unique identification that is the smallest uniqueidentification that is still greater than the unique identification ofthe first computing device; responsive to determining that no such nodein the second set of nodes exists, selecting the node with the lowestunique identification in the second set of nodes.

Example B5

The subject matter of any of examples B1-B4, wherein selecting the firstand second nodes comprises: searching the second set of nodes for aparticular node with a unique identification that is the largest uniqueidentification that is still smaller than the unique identification ofthe first computing device; and responsive to finding the particularnode, selecting the particular node.

Example B6

The subject matter of any of examples B1-B4, wherein selecting the firstand second nodes comprises: searching the second set of nodes for aparticular node with a unique identification that is the largest uniqueidentification that is still smaller than the unique identification ofthe first computing device; responsive to determining that no such nodein the second set of nodes exists, selecting the node with the highestunique identification in the second set of nodes.

Example B7

The subject matter of any one of examples B1-B6, wherein the connectionalgorithm connects to all nodes in the second set of nodes including thefirst and second nodes.

Example B8

The subject matter of any one of examples B1-B6, wherein the connectionalgorithm connects to a subset of all the nodes in the second set ofnodes including the first and second nodes.

Example B9

The subject matter of any one of examples B1-B8, wherein downloading atleast one file system element corresponding to a file system event ofthe synchronized event stream from at least one of the connected nodescomprises: determining that the first node is on a same local areanetwork as the computing device and the second node of the at least twoconnected nodes is not on the same local area network and in response,downloading the at least one file system element from the first node.

Example B10

A computer system comprising modules configured to perform theoperations of any one or more of examples B1-B9.

Example B11

A non-transitory computer readable medium comprising instructions thatwhen executed by a processor of a computing system configures thecomputing system to perform operations of any one or more of examplesB1-B9.

Deduplicated Data Distribution Examples Example C1

Subject matter (such as a device, instructions, or a method) forestablishing a deduplication-based reconstruction of file system data,the method comprising operations performed by at least one processor ofa first computing system, and the operations including: transmitting, toa second computing system, a request for metadata of a desired file;receiving, from the second computing system, the metadata of the desiredfile, the metadata indicating respective identifiers of each block ofthe desired file; identifying, at the first computing system with use ofthe metadata, one or more blocks of the desired file on a data storeassociated with the first computing system; and reconstructing thedesired file at the first computing system from the one or more blocksof the desired file on a data store, the reconstructing performed withuse of the metadata received from the second computing system.

Example C2

The subject matter of Example C1, wherein each of the blocks of thedesired file are stored in a matching source file on the data storeassociated with the first computing system, the operations ofreconstructing the one or more blocks of the desired file comprising:retrieving each of the blocks of the desired file from the matchingsource file on the data store associated with the first computingsystem; and writing each of the blocks retrieved from the matchingsource file to a file system location in a destination data storeassociated with the first computing system.

Example C3

The subject matter of Example C1, wherein one or more of the blocks ofthe desired file are not stored in the data store associated with thefirst computing system, the operations for reconstructing the one ormore blocks of the desired file comprising: transmitting, to the secondcomputing system, a request for the one or more of the blocks of thedesired file that are not stored in the data store associated with thefirst computing system, the request indicating respective identifiers ofthe one or more blocks of the desired file that are not stored in thedata store associated with the first computing system; receiving, fromthe second computing system, the one or more of the blocks of thedesired file that are not stored in the data store associated with thefirst computing system; writing the one or more of the blocks of thedesired file received from the computing system to a destination datastore on the first computing system; retrieving at least one remainingblock of the desired file from a source file located in the data storeassociated with the first computing system; and writing the at least oneremaining block of the desired file to a file system location in adestination data store associated with the first computing system.

Example C4

The subject matter of Example C1, the operations comprising:determining, from an index associated with the first computing system,that one or more other blocks of the desired file do not exist on thedata store associated with the first computing system, the determiningperformed using the metadata received from the second computing system;obtaining the one or more other blocks of the desired file from thesecond computing system; and obtaining the one or more blocks of thedesired file from the data store associated with the first computingsystem; wherein the operations of reconstructing the desired file at thefirst computing system include reconstructing the desired file from theone or more blocks of the desired file obtained from the data storeassociated with the first computing system and the one or more otherblocks of the desired file obtained from the second computing system.

Example C5

The subject matter of Example C4, wherein the determining that the oneor more other blocks of the desired file do not exist on the data storeassociated with the first computing system is performed with use of abloom filter cache, the bloom filter cache operated using at least aportion of the respective identifiers of each block of the desired file.

Example C6

The subject matter of any of Examples C1-C5, wherein identifying the oneor more blocks of the desired file is performed with use of a blockindex, the block index providing respective identifiers of a pluralityof blocks located within files stored on the data store associated withthe first computing system.

Example C7

The subject matter of any of Examples C1-C6, the operations comprising:validating the desired file in response to the reconstructing, thevalidating performing a comparison of a digital signature of the desiredfile that is provided from the reconstructing with a digital signatureof the desired file that is provided from the metadata of the desiredfile.

Example C8

The subject matter of Example C7, wherein the respective identifiers ofeach block of the desired file is based at least in part on an MD5 hashvalue of each block and wherein the digital signature for the desiredfile is based at least in part on an SHA-2 hash value of the desiredfile.

Example C9

A computer system comprising modules configured to perform theoperations of any one or more of examples C1-C8.

Example C10

A non-transitory computer readable medium comprising instructions thatwhen executed by a processor of a computing system configures thecomputing system to perform operations of any one or more of examplesC1-C8.

Predictive Storage Examples Example D1

Example D1 includes subject matter (such as a method, means forperforming acts, machine readable medium including instructions that,when performed by a machine, cause the machine to perform acts, or anapparatus configured to perform) for predictive data storage on a datadistribution system comprising: scoring respective ones of a pluralityof file system elements of a collection of the data distribution systembased upon a calculated probability that a user of a node is likely tointeract with the respective element; determining an on-demand subset ofthe collection based upon the scores of the respective plurality ofelements, wherein the on-demand subset contains fewer elements than thecollection; determining that at least one file system element of theon-demand subset is not already in a local storage of the node; andresponsive to determining that the at least one of the on-demand subsetis not already in the local storage of the node, requesting the elementfrom a second node in the data distribution system over a computernetwork.

Example D2

In example D2, the subject matter of example D1 may optionally includewherein the second node is a peer node.

Example D3

In example D3, the subject matter of example D1 may optionally includewherein the second node is an authority node.

Example D4

In example D4, the subject matter of any one or more of examples D1-D3may optionally include wherein scoring comprises utilizing prior usagehistory corresponding to at least one of the plurality of file systemelements.

Example D5

In example D5, the subject matter of example D4 may optionally includewherein the prior usage history is specific to the user of the node.

Example D6

In example D6, the subject matter of example D4 may optionally includewherein the prior usage history is a usage history corresponding to allusers of the collection.

Example D7

In example D7, the subject matter of any one or more of examples E1-E6may optionally include wherein prior usage history corresponding to theuser of the node is weighted greater than the prior usage history ofother users of the collection.

Example D8

In example D8, the subject matter of any one or more of examples D1-D7may optionally include wherein scoring comprises utilizing contextualdata specifying a context corresponding to a prior usage history.

Example D9

In example D9, the subject matter of any one or more of examples D1-D8may optionally include wherein the prior usage history comprises aninteraction by a particular user of the collection with one of theplurality of elements in the collection and wherein the contextual datasignals a particular situation in which the particular user was in whenthe particular user interacted with the one of the plurality ofelements.

Example D10

In example D10, the subject matter of any one or more of examples D1-D9may optionally include building a machine learning model; and whereinscoring comprises using the machine learning model.

Example D11

In example D11, the subject matter of any one or more of examples D1-D10may optionally include wherein determining an on-demand subset of thecollection based upon the scores of the respective plurality of elementscomprises selecting the highest scoring elements in the on-demand set.

Example D12

In example D12, the subject matter of any one or more of examples D1-D11may optionally include wherein selecting the highest scoring elements inthe on-demand set comprises selecting the highest scoring elements inthe on-demand set until a predetermined limit on one of: a local storagesize and a number of elements in the on-demand set has been reached.

Example D13

In example D11, the subject matter of any one or more of examples D1-D12may optionally include wherein determining an on-demand subset of thecollection based upon the scores of the respective plurality of elementscomprises selecting a combination of the respective plurality ofelements that results in the highest combined score of selected elementsgiven one of: a size constraint on the maximum number of elements in theon-demand set and a size constraint on the maximum total size of theelements in the on-demand set.

Example D14

In example D14, the subject matter of any one or more of examples D1-D13may optionally include receiving the element from the second node; andresponsive to receiving the element from the second node, storing theelement in the local storage of the node.

Example D15

A computer system comprising modules configured to perform theoperations of any one or more of examples D1-D14.

Example D16

A non-transitory computer readable medium comprising instructions thatwhen executed by a processor of a computing system configures thecomputing system to perform operations of any one or more of examplesD1-D14.

What is claimed is:
 1. At least one machine readable medium that is nota transitory propagating signal, the machine readable medium includinginstructions that, when executed by hardware of a node, cause the nodeto perform operations comprising: receiving a set of peer nodes from acollection authority node managing a collection, the node and the set ofpeer nodes being members of the collection; selecting a subset of peernodes from the set of peer nodes; attempting to establish communicationswith each of the subset of peer nodes, wherein connected peers includepeers from the subset of peer nodes where the attempt to establishcommunications was successful; and synchronizing an event stream with atleast one connected peer.
 2. The machine readable medium of claim 1,wherein the set of peer nodes is ordered, and wherein the subset of peernodes is selected based on the order of the set of peer nodes.
 3. Themachine readable medium of claim 2, wherein the set of peer nodes isordered by a connectivity characteristic.
 4. The machine readable mediumof claim 3, wherein the connectivity characteristic is at least one ofnetwork bandwidth, network performance, cost per byte transferred,latency, power source, power remaining, processing power, storagecapacity, or proximity to the node.
 5. The machine readable medium ofclaim 4, wherein proximity to the node is based on network type, andwherein a local area network (LAN) network type is assigned a highorder, wherein the LAN network type is local to the node.
 6. The machinereadable medium of claim 2, wherein the set of peer nodes is ordered byan implemented feature set defining features of the nodes.
 7. Themachine readable medium of claim 2, wherein the set of peer nodes areordered by the collection authority node.
 8. The machine readable mediumof claim 7, wherein an order of a peer node in the set of peer nodes isdenoted by a unique identification (ID), and wherein being selectedbased on their order includes the node comparing its unique ID to thoseof the set of peer nodes.
 9. The machine readable medium of claim 8,wherein comparing the unique ID of the node to those of the set of peernodes includes splitting the set of peer nodes at the unique ID of thenode, the splitting resulting in an upper half and a lower half, theupper half including peer nodes with a unique ID greater than the uniqueID of the node and the lower half including peer nodes with a unique IDlower than the unique ID of the node.
 10. The machine readable medium ofclaim 1, wherein the operations further comprise sending a connectionmessage to the collection authority node, and wherein the set of peernodes is received in response to the connection message.
 11. The machinereadable medium of claim 10, wherein the connection message includesconnectivity characteristics of the node.
 12. The machine readablemedium of claim 11, wherein the operations further comprise periodicallysending the connectivity characteristics of the node to the collectionauthority node.
 13. The machine readable medium of claim 1, wherein theset of peer nodes includes connection information for members of the setof peer nodes.
 14. The machine readable medium of claim 13, wherein theconnection information includes at least one of an address, a protocol,or authentication information.
 15. The machine readable medium of claim14, wherein the connection information includes all of the address, theprotocol, and the authentication information.
 16. The machine readablemedium of claim 1, wherein synchronizing the event stream includescomparing a local event log against remote event logs and rectifyingdifferences between the local event log and the remote event logs. 17.The machine readable medium of claim 16, wherein comparing the localevent log against the remote event log includes comparing a versionvector of the local event log against a version vector for each of theremote event logs.
 18. The machine readable medium of claim 16, whereinrectifying the differences between the local event log and remote eventlogs includes transmitting a missing event to at least one node in thesubset of peer nodes.
 19. The machine readable medium of claim 16,wherein rectifying the differences between the local event log andremote event logs includes receiving a missing event from the localevent log from a peer node in the subset of peer nodes.
 20. The machinereadable medium of claim 1, wherein all members of the collection are aproper superset of the set of peer nodes, the set of peer nodes being apreferred participant node subset as designated by the collectionauthority node, wherein all members of the collection are communicatedto the node from the collection authority node.
 21. The machine readablemedium of claim 20, wherein a cardinality of the set of peer nodes isbelow a threshold, and wherein the set of peer nodes is selected fromall members of the collection based on an ordering of the members of thecollection.
 22. The machine readable medium of claim 1, wherein aconnection is lost to a member of the connected peers, and wherein theoperations further comprise adding a member of the set of peer nodesthat is not in the subset of peer nodes to the connected peers after asuccessful connection attempt is made.
 23. The machine readable mediumof claim 1, wherein the operations further comprise at least one oftransferring or receiving all or a portion of a file system elementacross the event stream, the file system element being one of aplurality of file system elements of the collection, the collectiondefining a local root for each file system element and member of thecollection.